publications

Margarita Alonso Ramos

HARTAes-vas: Lexical combinations for an academic writing aid tool in Spanish and Basque (2022)

SEPLN-PD 2022. Annual Conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations, September 21-23, 2022, A Coruña, España.

A Garcia Olea, I Valdelvira Vazquez, I Diez Gonzalez, A Atutxa Salazar, K Gojenola Galletebeitia, J M Ormaetxe Merodio

Prediction of new onset atrial fibrillation recurrence or persistence with artificial intelligence: first insights of the PRAFAI study (2022)

European Heart Journal - Digital Health, Volume 3, Issue 4, December 2022,

Itziar Glez Dios, Aitor Soroa, Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben Allal, Francesco de Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, etal.

The BigScience ROOTS Corpus: A 1.6 TB Composite Multilingual Dataset (2022)

2022. Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)

Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.

Jose Mari Arriola

MACHINE TRANSLATION AS AN AID FOR WRITING BY COMPUTER SCIENCE UNIVERSITY STUDENTS (2022)

15th annual International Conference of Education, Research and Innovation, 7-9 November, 2022 Seville, Spain

Oscar Cumbicus-Pineda, Iker Gutiérrez-Fandiño, Itziar Gonzalez-Dios, Aitor Soroa

Noisy Channel for Automatic Text Simplification (2022)

Cumbicus-Pineda, O. M., Gutiérrez-Fandiño, I., Gonzalez-Dios, I., & Soroa, A. (2022). Noisy Channel for Automatic Text Simplification. arXiv preprint arXiv:2211.03152.

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Soroa, A., Gonzalez-Dios, I,... & Manica, M.

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (2022)

Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., ... & Manica, M. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv preprint arXiv:2211.05100.

Nora Hollenstein, Itziar Gonzalez-Dios, Lisa Beinborn, and Lena Jäger

Patterns of text readability in human and predicted eye movements (2022)

Nora Hollenstein, Itziar Gonzalez-Dios, Lisa Beinborn, and Lena Jäger. 2022. Patterns of Text Readability in Human and Predicted Eye Movements. In Proceedings of the Workshop on Cognitive Aspects of the Lexicon, pages 1–15, Taipei, Taiwan. Association for Computational Linguistics.

Petter Mæhlum, Andre Kåsen, Samia Touileb, and Jeremy Barnes.

Annotating Norwegian language varieties on Twitter for Part-of-speech. (2022)

Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects

David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal

Direct Parsing to Sentiment Graphs (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478

Mikel Iruskieta, Mari Mar Boillos

Aproximación al género Trabajo de Fin de Grado en euskera: hacia una identificación de las características lingüístico-discursivas (2022)

In Elena Alarcón, José Sanchez-Santamaria, Purificación Cruz (Coord.) Nuevos contenidos para una nueva docencia, 283-296

Xabier Soto, Olatz Pérez-de-Viñaspre, Maite Oronoz, Gorka Labaka

Development of a Machine Translation system for promoting the use of a low resource language in the clinical domain: the case of Basque. (2022)

Chapter 7 In Natural Language Processing In Healthcare A Special Focus on Low Resource Languages. Routledge, Taylor & Francis Group.

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli

European Clinical Case Corpus (2022)

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Anne-Lyse Minard, Manuela Speranza, and Roberto Zanoli (2022). European Clinical Case Corpus. Georg Rehm ed. European Language Grid, A Language Technology Platform for Multilingual Europe. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-031-17258-8

Adrián Núñez-Marcos, Olatz Perez-de-Viñaspre, Gorka Labaka

A survey on Sign Language machine translation (2022)

Expert Systems with Applications, Volume 213, part B. URL: https://doi.org/10.1016/j.eswa.2022.118993 ISSN: 0957-4174

MarÍa Jesús Aranzabe, Izaskun Aldezabal, Igone Zabala

Recursos y Herramientas de Lingüística de Corpus y PLN para la Monitorización e Investigación de los Usos Académicos del Euskera (2022)

III. workshop de INTELE (Infraestructura de Tecnologías del Lenguaje). Madrid, 13 y 14 de septiembre (Workshop horretan aurkeztutako posterra)

Mikel Iruskieta

INTELE: promoviendo la participación en las infraestructuras: CLARIN y DARIAH (2022)

The International Congress on Libraries & Digital Humanities: projects and challenges

María Jesús Aranzabe, Antton Gurrutxaga, Igone Zabala

Compilación del corpus académico de noveles en euskera HARTAeus y su explotación para el estudio de la fraseología académica (2022)

Procesamiento del Lenguaje Natural, 69, 95-103

Gonzalez-Dios, Itziar and Altuna, Begoñ

Natural Language Processing and Language Technologies for the Basque Language (2022)

Gonzalez-Dios, Itziar and Altuna, Begoña (2022). Natural Language Processing and Language Technologies for the Basque Language. In Cuadernos Europeos de Deusto. NÚMERO ESPECIAL. Linguas minoritarias e futuro de Europa. Minority Languages and the Future of Europe 26, 203-230. https://doi.org/10.18543/ced.2477 https://ced.revistas.deusto.es/issue/view/285

Izaskun Aldezabal, Jose Mari Arriola, Arantxa Otegi

TZOS: an Online Terminology Database Aimed at Working on Basque Academic Terminology Collaboratively (2022)

Proceedings of the 13th Language Resources and Evaluation Conference. Editors: Nicoletta Calzolari (Conference chair), Fred´ eric B ´ echet, Philippe Blache, Khalid Choukri, ´ Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hel´ ene Mazo, Jan Odijk, Stelios Piperidis

Gildo Fabregat Ander Cejudo Juan Martinez-Romo Alicia Pérez Lourdes Araujo Nuria Lebeña Maite Oronoz Arantza Casillas

Approximate Nearest Neighbour Extraction Techniques and Neural Networks for Suicide Risk Prediction in the CLPsych 2022 Shared Task (2022)

CLPsych 2022 Shared Task, Accepted in CLPsych 2022 Shared Task, July 15th 2022

Eneko Agirre

Few-shot Information Extraction is Here: Pre-train, Prompt and Entail (2022)

In Few-shot Information Extraction is Here: Pre-train, Prompt and Entail

E Agirre, M Apidianaki, I Vulić

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2022)

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, Dublin, Ireland

Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, Bonan Min

ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations (2022)

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, Seattle, Washington. Association for Computational Linguistics.

Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre

Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning (2022)

In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, Washington. Association for Computational Linguistics.

Jon Alkorta, Mikel Iruskieta

Adding the Basque Parliament Corpus to ParlaMint Project (2022)

ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora: 107–110

Ibarra, I. eta Iruskieta M.

Corpus lingüísticos, smartpen y whatsapp: Intervención en escritura de una madre con sus hijos (2022)

IV Congreso internacional en Inclusión Social y Educativa: CIISE

Irune Ibarra, Mikel Iruskieta

Disgrafia hobetzeko esku-hartzea idazkailu digitala erabiliz (2022)

UZTARO 121, 155-178

Mikel Iruskieta

Herramientas Digitales para las Humanidades Digitales en la e-infraestructura CLARIN (2022)

Creación de un proyecto en humanidades digitales basado en el análisis de textos: modelado y procesamiento

Harritxu Gete, Thierry Etchegoyhen, David Ponce, Gorka Labaka, Nora Aranberri, Ander Corral, Xabier Saralegi, Igor Ellakuria and Maite Martin

TANDO: A Corpus for Document-level Machine Translation. (2022)

Proceedings of the 13th Edition of the Language Resources and Evaluation Conference (LREC 2022)

Xabier Soto, Olatz Perez-De-Viñaspre, Gorka Labaka, Maite Oronoz

Comparing and combining tagging with different decoding algorithms for back-translation in NMT: learnings from a low resource scenario (2022)

In Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, pages 31–40, Ghent, Belgium. European Association for Machine Translation.

Ona de Gibert Bonet, Iakes Goenaga, Olatz Perez-de-Viñaspre, Jordi Armengol-Estapé, Carla Parra Escartín, Marina Sanchez, Mārcis Pinnis, Gorka Labaka and Maite Melero

Unsupervised Machine Translation in Real-World Scenarios (2022)

Proceedings of the 13th Edition of the Language Resources and Evaluation Conference (LREC 2022)

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Principled Paraphrase Generation with Parallel Corpora (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1621-1638

Owen Trigueros, Alberto Blanco, Nuria Lebeña, Arantza Casillas, Alicia Pérez

Explainable ICD multi-label classification of EHRs in Spanish with convolutional attention (2022)

International Journal of Medical Informatics

Alberto Blanco, Sonja Remmer, Alicia Pérez, Hercules Dalianis, Arantza Casillas

Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish (2022)

Journal of Biomedical Informatics

Itxaso Alayo, Ander Merketegi, Maite Oronoz, Arantza Casillas, Alicia Pérez, Olatz Garin, Isabel Moreira, Montse Ferrer, Jordi Alonso, Yolanda Pardo

A baseline model for the automation of the systematic review of Patient-Reported Outcomes measures: the case of the BiblioPRO virtual library (2022)

Jornada científica CIBERESP 2022 (https://jornadacientifica.ciberesp.es/). Centro de Investigación Biomédica en Red, Epidemiología y Salud Pública.

Alberto Blanco, Alicia Pérez, Arantza Casillas

Exploiting ICD Hierarchy for Classification of EHRs in Spanish Through Multi-Task Transformers (2022)

IEEE Journal of Biomedical and Health Informatics

Arantxa Otegi, Iñaki San Vicente, Xabier Saralegi, Anselmo Peñas, Borja Lozano, Eneko Agirre

Information retrieval and question answering: A case study on COVID-19 scientific literature (2022)

Knowledge-Based Systems, Volume 240.

Agustín Alonso, Victor García, Inma Hernáez, Eva Navas, Jon Sanchez 

Automatic Classification of Synthetic Voices for Voice Banking Using Objective Measures (2022)

Itziar Aldabe, Aritz Farwell, Eva Navas, Inma Hernaez, German Rigau 

ELE Project: an overview of the desk research (2022)

Inma Hernaez, Jose Andres Gonzalez Lopez, Eva Navas, Jose Luis Pérez Córdoba, Ibon Saratxaga, Gonzalo Olivares, Jon Sanchez de la Fuente, Alberto Galdón, Victor Garcia, Jesús del Castillo, Inge Salomons, Eder del Blanco Sierra 

ReSSInt project: voice restoration using Silent Speech Interfaces (2022)

Eder Del Blanco, Inge Salomons, Eva Navas, Inma Hernáez 

Phone classification using electromyographic signals (2022)