Study of lexical combinations in Basque based on a novice academic corpus for an Academic Texts Writing Aid
(2020 - 2023)
More than half of the students of the Basque public university use Basque as a vehicular language, and face the challenge of understanding and producing the textual genres characteristic of academic communication. In order to achieve this challenge, they need to become acquainted with the recurrent academic lexical combinations (ALC) of these genres: colocations, discourse markers and other discursive formulas. Several academic writing assistance tools have been developed for languages of major use in the academic environment such as English. This proliferation demonstrates the difficulty for students of acquiring the ALC, although they are used repeatedly in the texts used in higher education. Our starting hypothesis is that the acquisition of ALCs in Basque is even more difficult due to several reasons related to the sociolinguistic status of the language: a) Many ALCs have not been established yet and have a greater degree of variation than in well-established languages. b) Some combinations of words that are not correct from the semantic or syntactic point of view become recurrent and rapidly multiply their presence in the texts. However, it is necessary to study the real extent of the hypothetical difficulties in corpora. With this aim in mind, we will compile a Basque corpus of novice writers and compare it with the HARTA corpus of Spanish novice writers. We want to test the possibility of extracting Spanish-Basque bilingual combinations using distributional semantics techniques on comparable corpora. In this way, we can take advantage of the work of assigning discursive functions to the ALC established in HARTA from the corpus of experts in Spanish. The overall goal of this subproject is to investigate the use of ALCs by the students that use Basque in academic writing and compare it with that of students using Spanish. Our applied aim is to design an academic writing assistance tool in both Basque and Spanish, which integrates both lexicon and corpus. This tool would help students developing their skills for academic writing and, in addition, it would contribute to the normalization of Basque academic registers.
Organization: Ministerio de Ciencia Innovación y Universidades
Main researcher: Miren Igone Zabala Unzalu
Participants: Maxux Aranzabe, Nerea Ezeiza, Josu Goikoetxea, Uxoa Iñurrieta, Igone Zabala
Maxux Aranzabe, Nerea Ezeiza, Josu Goikoetxea, Uxoa Iñurrieta, Igone Zabala