Speech and Language Resources

For the development of products and applications in Linguistic Technology it is necessary to have basic linguistic resources (textual and oral corpus, lexicons and knowledge bases) and development tools (morphological and syntactic analysers, meaning disambiguators, corpus treatment tools, lemmatisers, integrated tool environments, etc.).

We have more than 25 years of experience in the creation of this type of basic linguistic resources and we have different reference corpus, lexicons ...Read More

see more

data_tabs

Demos

Konbitzul

Izen+aditz konbinazio-itzulpenen datu-basea

e-ROLda

A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus

Universal Dependencies treebank for Basque

This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.

 

Contracts

Projects

Patents

Eusemcor

Corpus tagged with Basque WordNet senses.

Basque WordNet / Euskal WordNet

Basque WordNet

EDBL

Basque lexical database.

EPEC-ROLSEM

Corpus tagged with semantic roles.

EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory.

Resources

Publications

Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza

Adapting TimeML to Basque: Event Annotation (2018)

In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1

Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Konbitzul: an MWE-specific Database for Spanish-Basque (2018)

Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.

Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria

Verbal Multiword Expressions in Basque corpora (2018)

In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)

Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga

Coreferential Relations in Basque: The Annotation Process (2018)

J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.

Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar

Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)
file2
(2018)

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf

Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza

How the corpus-based Basque Verb Index lexicon was built (2018)

Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands

Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J.

A Methodology for the Semiautomatic Annotation of EPEC-RolSem, a Basque Corpus Labeled at Predicate Level following the PropBank/Verbnet Model (2016)

Edward Vanhoutte (ed.) Digital Scholarship in the Humanities (2016) 31 (3): 470-492. DOI: http://dx.doi.org/10.1093/llc/fqv001 First published online: 17 June 2015 (23 pages). Published by Oxford University Press on behalf of EADH: The European Association for Digital Humanities (Online ISSN 2055-768X - Print ISSN 2055-7671)

Maria Jesús Aranzabe, Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Iakes Goenaga, Koldo Gojenola, Larraitz Uria

Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies (2015)

Markus Dickinsons, Erhard Hinrichs, Agnieszka Patejuk, Adam Przepiórkowski (eds), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), 233-241. Institute of Computer Science of the Polish Academy of Sciences, Warszawa, Poland. ISBN: 978-83-63159-18-4

Iruskieta M., Aranzabe M., Diaz de Ilarraza A., Gonzalez I., Lersundi I., Lopez de Lacalle O.

The RST Basque TreeBank: an online search interface to check rhetorical relations (2013)

4th​ Workshop RST and Discourse Studies, 40-49, Sociedad Brasileira de Computacao, Fortaleza, CE, Brasil. October 20-24 (http://encontrorst2013.wix.com/encontro-rst-2013)​

Pociello E., Agirre E. and Aldezabal I.

Methodology and construction of the Basque WordNet (2011)

Language Resources and Evaluation. Springer. Volume 45, Issue 2, pp 121-142. ISSN 1574-020X. DOI 10.1007/s10579-010-9131-y. official

Izaskun Aldezabal, Maria Jesús Aranzabe, Jose Maria Arriola, Arantza Diaz de Ilarraza

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues (2009)

Corpus Linguistics and Linguistic Theory 5-2 (2009), 241-269. Mouton de Gruyter. Berlin-New York. Print ISSN: 1613-7027 Online ISSN: 1613-7035

Itziar Aduriz, Maria Jesús Aranzabe, Jose Maria Arriola, Aitziber Atutxa, Arantza Diaz de Ilarraza, Nerea Ezeiza, Koldo Gojenola, Maite Oronoz, Aitor Soroa, Ruben Urizar

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing (2006)

Corpus Linguistics Around the World. Book series: Language and Computers. Vol 56 (pag 1- 15). ISBN 90-420-1836-4 Ed. Andrew Wilson, Paul Rayson, and Dawn Archer. Rodopi. Netherlands.

Eneko Agirre, Izaskun Aldezabal, Jone Etxeberria, Mikel Iruskieta, Elixabete Izagirre, Karmele Mendizabal, Eli Pociello

Improving the Basque WordNet by corpus annotation. (2006)

Proceedings of Third International WordNet Conference. pp. 287-290. ISBN 80-210-3915-9. Jeju Island (Korea).

Izaskun Aldezabal, Olatz Ansa, Bertol Arrieta, Xabier Artola, Aitzol Ezeiza, Gregorio Hernández, Mikel Lersundi

EDBL: a General Lexical Basis for the Automatic Processing of Basque (2001)

IRCS Workshop on linguistic databases. Philadelphia (USA).

More publications

data_tabs_full

Konbitzul

Izen+aditz konbinazio-itzulpenen datu-basea

e-ROLda

A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus

Universal Dependencies treebank for Basque

This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.

 

Eusemcor

Corpus tagged with Basque WordNet senses.

Basque WordNet / Euskal WordNet

Basque WordNet

EDBL

Basque lexical database.

EPEC-ROLSEM

Corpus tagged with semantic roles.

EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory.

Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza

Adapting TimeML to Basque: Event Annotation (2018)

In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1

Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Konbitzul: an MWE-specific Database for Spanish-Basque (2018)

Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.

Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria

Verbal Multiword Expressions in Basque corpora (2018)

In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)

Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga

Coreferential Relations in Basque: The Annotation Process (2018)

J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.

Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar

Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)
file2
(2018)

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf

Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza

How the corpus-based Basque Verb Index lexicon was built (2018)

Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands

Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J.

A Methodology for the Semiautomatic Annotation of EPEC-RolSem, a Basque Corpus Labeled at Predicate Level following the PropBank/Verbnet Model (2016)

Edward Vanhoutte (ed.) Digital Scholarship in the Humanities (2016) 31 (3): 470-492. DOI: http://dx.doi.org/10.1093/llc/fqv001 First published online: 17 June 2015 (23 pages). Published by Oxford University Press on behalf of EADH: The European Association for Digital Humanities (Online ISSN 2055-768X - Print ISSN 2055-7671)

Maria Jesús Aranzabe, Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Iakes Goenaga, Koldo Gojenola, Larraitz Uria

Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies (2015)

Markus Dickinsons, Erhard Hinrichs, Agnieszka Patejuk, Adam Przepiórkowski (eds), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), 233-241. Institute of Computer Science of the Polish Academy of Sciences, Warszawa, Poland. ISBN: 978-83-63159-18-4

Iruskieta M., Aranzabe M., Diaz de Ilarraza A., Gonzalez I., Lersundi I., Lopez de Lacalle O.

The RST Basque TreeBank: an online search interface to check rhetorical relations (2013)

4th​ Workshop RST and Discourse Studies, 40-49, Sociedad Brasileira de Computacao, Fortaleza, CE, Brasil. October 20-24 (http://encontrorst2013.wix.com/encontro-rst-2013)​

Pociello E., Agirre E. and Aldezabal I.

Methodology and construction of the Basque WordNet (2011)

Language Resources and Evaluation. Springer. Volume 45, Issue 2, pp 121-142. ISSN 1574-020X. DOI 10.1007/s10579-010-9131-y. official

Izaskun Aldezabal, Maria Jesús Aranzabe, Jose Maria Arriola, Arantza Diaz de Ilarraza

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues (2009)

Corpus Linguistics and Linguistic Theory 5-2 (2009), 241-269. Mouton de Gruyter. Berlin-New York. Print ISSN: 1613-7027 Online ISSN: 1613-7035

Itziar Aduriz, Maria Jesús Aranzabe, Jose Maria Arriola, Aitziber Atutxa, Arantza Diaz de Ilarraza, Nerea Ezeiza, Koldo Gojenola, Maite Oronoz, Aitor Soroa, Ruben Urizar

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing (2006)

Corpus Linguistics Around the World. Book series: Language and Computers. Vol 56 (pag 1- 15). ISBN 90-420-1836-4 Ed. Andrew Wilson, Paul Rayson, and Dawn Archer. Rodopi. Netherlands.

Eneko Agirre, Izaskun Aldezabal, Jone Etxeberria, Mikel Iruskieta, Elixabete Izagirre, Karmele Mendizabal, Eli Pociello

Improving the Basque WordNet by corpus annotation. (2006)

Proceedings of Third International WordNet Conference. pp. 287-290. ISBN 80-210-3915-9. Jeju Island (Korea).

Izaskun Aldezabal, Olatz Ansa, Bertol Arrieta, Xabier Artola, Aitzol Ezeiza, Gregorio Hernández, Mikel Lersundi

EDBL: a General Lexical Basis for the Automatic Processing of Basque (2001)

IRCS Workshop on linguistic databases. Philadelphia (USA).

More publications