Informazioaren Erauzketa eta Berreskurapena

Azken urteetan, sarean dauden egituratu gabeko testu-baliabideen kopuruak eta testu horiek baliatzen dituzten ezagutzaren erauzketa automatikoa egiten duten aplikazioek bultzatuta, informazioaren berreskurapena (IB) eta erauzketa (IE) ikerketa-arloak nabarmen igo dira. Oraintsu arte adituek testuak eskuz etiketatu izan dituzte, baina jakina da ataza hori ekonomikoki zein giza baliabideen aldetik oso garestia dela. Horregatik azken hamarkadan teknika berriak garatu dira etiketatze-lanak (erdi)automatikoki egiteko, eta, ondorioz, eskuz etiketatu beharreko datu kopurua murrizteko. Bestalde, indizeen, bilatzaileen eta oinarrizko IB sistemen erabilerak hainbat gabezia ditu. Gaur egun, helburua ez da informazioa hitz-segida huts gisa ikustea, baizik eta dokumentuan inplizitu dagoen esanahi semantikoa ulertzen saiatzea, bai eta testuak idazteko erabiltzen diren hizkera desberdinak lantzea ere. Zehazki, IBn eta IEn lan hauek egiten ditugu:

1. Izendun entitateen ezagutza: pertsonak, erakundeak, lekuak, denbora eta zenbakizko adierazpenak.
2. Terminologiaren erauzketa: corpusetatik kontzepturik garrantzitsuenak erauztea.
3. Entitateen eta kontzeptuen arteko erlazioen erauzketa.
4. Gertaeren erauzketa eta gertaera-sekuentzien erauzketa testu barnean eta testuen artean.
5. Testu genero eta domeinu ezberdinei aplikatutako iritzien erauzketa.
6. Testuen antzekotasun semantikoa.
7. Multimedia edukien sailkapen automatikoa.

Arloaren egoerako emaitzak lortu ditugu informazioaren berreskurapen eta erauzketa eleaniztunean, eta emaitza horiek Hizkuntzaren Prozesamenduko kongresu eta aldizkari nagusietan (ACL, EMNLP, Artificial Intelligence Journal, Knowledge Based Systems...) argitaratu ditugu. Horretaz gain, Europako (NEWSREADER, LoCloud, OpeNER, PATHS, KYOTO, MEANING) zein estatuko (CROSSTEXT, TUNER, SKATER, KNOW) proiektuetan parte hartu dugu, eta zenbaitetan koordinatzaile lanak ere egin ditugu. Google Research Award bat (Eneko Agirre) lortu dugu eta enpresekin harreman estua dugu transferentzia teknologikoa sustatzeko.

Ikertzaile nagusia:

Aitor Soroa

ikertzaileak:

Arantza Díaz de Ilarraza

Ainara Estarrona

Nerea Ezeiza

Joseba Fernandez de Landa

Iker García

Itziar Gonzalez-Dios

Mikel Iruskieta

Oier Lopez de Lacalle

ie_ir_tabs

Demoak

Demo of the NewsReader NLP pipeline

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Kontratuak

Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. - TECNALIA
(2024 - 2028)
Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. MULTIVERSE.
(2025 - 2028)
Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. ELHUYAR.
(2025 - 2028)
Asesoría científica en el diseño y construcción de sistemas de extracción de la información a partir de textos no estructurados

(2025 - 2026)
(2024 - 2025)
Asesoría científica en el diseño y construcción de sistemas de extracción de la información a partir de textos no estructurados

(2023 - 2024)
Adimen artifizial sortzailea web mintegia (webinar).
Online course for Gipuzkoa Provincial Council employees
(2024 - 2024)
Data Privacy in Artificial Intelligence for Health Applications: A QA system to extract specific information from medical reports that can be used for better decision making
(2020 - 2021)
Pre-training cross-lingual language models
(2020 - 2020)
(2019 - 2020)

All HiTZ projects.

Proiektuak

ECHOLOT — European Cultural Heritage Optimised Linked Open Tools

(2026 - 2028)
Humanizing AI with language technology (HumanAIze)
(2025 - 2028)
Grant DeepThought (PID2024-159202OB-C21) funded by MICIU/AEI /10.13039/501100011033 and by ERDF, EU
(2025 - 2028)
Project CRITICS (PCI2025-167239-2) funded by MICIU/AEI /10.13039/501100011033 and co-funded by the European Union
(2025 - 2028)
DeepKnowledge (PID2021-127777OB-C21) project funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU
(2022 - 2026)
Project CNS2023-144375 funded by MTDFP/ and by European Union Next GenerationEU/ PRTR.
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
ICL4LANG: Aprendizaje En contexto como nuevo paradigma para investigar tecnologías del lenguaje escalables y de alta precisión adaptadas a las necesidades industriales del País Vasco

(2023 - 2025)
CLARIAH-EUS-gArA

(2024 - 2025)
Disargue (TED2021-130810B-C21) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/ PRTR
(2022 - 2025)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2024 - 2025)
Antidote (PCI2020-120717-2) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/PRTR
(2021 - 2024)
DeepR3 (TED2021-130295B-C31) founded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR.
(2022 - 2024)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2023 - 2024)
Better Extraction from Text Towards Enhanced Retrieval
(2019 - 2023)
Tools for the analysis of parliamentary discourses: polarization, subjectivity and affectivity in the post-truth era
(2020 - 2022)
DeepReading: Mining, Understanding, and Reasoning with Multilingual Content.
(2019 - 2021)
Deep learning, Big Data and knowledge for multilingual text processing.
(2019 - 2021)
New generation of neural artificial intelligence models to transform language technologies in the Basque Country's industry.
(2020 - 2021)
Automated surveillance of key questions on COVID-19 in scientific publications
(2020 - 2021)
Learning to Interact with Humans by Lifelong Interaction with Humans
(2017 - 2020)
CROSSTEXT: Automatic Generation of Multilingual Semantic Processors
Automatic generation of multilingual semantic taggers
(2017 - 2019)
TUNER: Automatic domain adaptation for semantic processing.
(2016 - 2018)
MUSTER: Multimodal processing of Spatial and TEmporal expRessions: Toward Understanding Space and Time in Language Enhanced by Vision.
Multimodal processing of Spatial and TEmporal expRessions: Toward Understanding Space and Time in Language Enhanced by Vision.
(2016 - 2018)
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
(2018 - 2018)

All HiTZ projects

Patenteak

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

Baliabideak

EIEC
Basque Named Entity Recognition corpus.
EDIEC
Basque corpus annotated for Named Entity Disambiguation.
MCR: Multilingual Central Repository
Multilingual lexical database with wordnets for several European languages, including Basque.
EPEC-EuSemcor
Corpus tagged with Basque WordNet senses.

Argitalpenak

Iker García-Ferrero

Cross-Lingual Transfer for Low-Resource Natural Language Processing (2025)file2 (2025)

Iñigo Alonso

Improving Fidelity and Table Representation in Table Understanding and Table-to-Text Generation (2025)file2 (2025)

Neil De La Fuente, Oscar Sainz, Iker García-Ferrero, Eneko Agirre

GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction (2025)

Findings of the Association for Computational Linguistics: ACL 2025

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models (2025)

Maitane Urruela, Sergio Martín, Iker De la Iglesia, Ander Barrena

Medical Argument Mining: Exploitation of Scarce Data Using NLI Systems (2025)

Vol. 75 (2025): Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025

Adrian Cuadron Cortes, Aimar Sagasti, Maitane Urruela, Iker De La Iglesia, Ane García Domingo-aldama, Aitziber Atutxa Salazar, Josu Goikoetxea, Ander Barrena

ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality (2025)

Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks), pages 1–10, Vienna, Austria. Association for Computational Linguistics.

Elisa Sanchez-Bayona, Rodrigo Agerri

Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding (2025)

In Findings of the Association for Computational Linguistics: ACL 2025, pages 17462–17477, Vienna, Austria. Association for Computational Linguistics.

Olia Toporkov, Alan Akbik, Rodrigo Agerri

Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data (2025)

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18219–18232, Suzhou, China. Association for Computational Linguistics.

Masson, Maxime, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, and Philippe Roose

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain (2025)

Knowledge-Based Systems (2025): 114001 (Elsevier).

Iker De la Iglesia, Adrián Sánchez-Freire, Oier Urquijo-Durán, Ander Barrena, Aitziber Atutxa

EriBERTa Private Surpasses her Public Alter Ego: Enhancing a Bilingual Pretrained Encoder with Limited Private Medical Data (2025)

Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025, p. 283-296

Mar Rodríguez, Olatz Perez-de-Viñaspre, Naiara Perez

A Two-Stage Multilingual Job Title Matching System: Combining Expert Knowledge and LLM-based Ranking (2025)

Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2025), Vol-4038, pp. 4479-4493

Olia Toporkov, Rodrigo Agerri

On the Role of Morphological Information for Contextual Lemmatization (2024)

Computational Linguistics (MIT Press).

Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)

The Twelfth International Conference on Learning Representations

Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle, Eneko Agirre

Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis (2024)

Proceeding of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin, Italy

Eneko Agirre, Itziar Aldabe, Xabier Arregi, Mikel Artetxe, Unai Atutxa, Ekhi Azurmendi, Iker De la Iglesia, Julen Etxaniz, Victor García-Romillo, Inma Hernaez-Rioja, Asier Herranz, Mikel Iruskieta, Oier López de Lacalle, Eva Navas, Paula Ontalvilla, Aitor Ormazabal, Naiara Perez, German Rigau1 Oscar Sainz, Jon Sanchez, Ibon Saratxaga, Aitor Soroa, Christoforos Souganidis, Jon Vadillo and Aimar Zabala

IKER-GAITU: research on language technology for Basque and other low-resource languages (2024)

Eneko Agirre, Olatz Arbelaitz, Olatz Arregi, Gorka Azkune, Arantza Casillas, Inma Hernaez, Mikel Iruskieta, Elena Lazkano, Eva Navas, German Rigau, Roberto Santana, Aitor Soroa and Rabih Zbib

ENIA Chair in Artificial Intelligence and Language Technology (2024)

Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.

A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)

Pensa, G., Altuna, B., & Gonzalez-Dios, I. (2024, May). A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 819-831).

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand

Ahmed Elhady, Khaled Elsayed, Eneko Agirre, and Mikel Artetxe

Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training (2024)

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 755–761, Mexico City, Mexico. Association for Computational Linguistics.

Iñigo Alonso, Eneko Agirre, Mirella Lapata

PixT3: Pixel-based Table To Text generation (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)

Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea

EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)

Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137

Iakes Goenaga, Aitziber Atutxa, Koldo Gojenola, Maite Oronoz, Rodrigo Agerri

Explanatory argument extraction of correct answers in resident medical exams (2024)

Artificial Intelligence in Medicine Volume 157, November 2024, 102985

Alain García Olea, Ane García Domingo-Aldama, Marcos Merino Prado, Koldo Gojenola Galletebeitia, Aitziber Atutxa Salazar, Mikel Maeztu Rada, Iván García Díaz, Adrián Costa, Iván Cano, Fernando Díaz, Irene Hernández, Uxue Millet, Ainhoa Etxenike, José Miguel Ormaetxe Merodio

RENDIMIENTO DE LAS EXPRESIONES REGULARES EN EL ANÁLISIS DE INFORMES DE ALTA PRESENTES EN LA HISTORIA CLÍNICA ELECTRÓNICA: EXPRIMIENDO LOS DATOS SECUNDARIOS (2024)

Revista Española de Cardiología. Rev Esp Cardiol. 2024;77 (Supl 1): 33

Alain García Olea, Ane García Domingo-Aldama, Marcos Merino Prado, Ignacio Díez González, Aitziber Atutxa Salazar, Josu Goikoetxea Salutregi, Koldo Gojenola Galletebeitia, Mikel Maeztu Rada, Iván Cano González, Adrián Costa Santos, Iván García Díaz, Fernando Díaz González, Irene Hernández Pérez, Uxue Millet Oyarzabal y José Miguel Ormaetxe Merodio

RENDIMIENTO DE SISTEMAS DE CHAT ALIMENTADOS CON ARTÍCULOS DE INVESTIGACIÓN EN UN ENTORNO CLÍNICO ESPECÍFICO: LA ENFERMEDAD VALVULAR CARDIACA (2024)

Revista Española de Cardiología. Rev Esp Cardiol. 2024;77 (Supl 1): 1161

Iñigo Alonso, Eneko Agirre, Mirella Lapata

PixT3: Pixel-based Table-To-Text Generation (2024)

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) https://aclanthology.org/2024.acl-long.364

Maxime Masson, Philippe Roose, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Rodrigo Agerri

ProxMetrics: modular proxemic similarity toolkit to generate domain-adaptable indicators from social media (2024)

Social Network Analysis and Mining, 14(1), pp.1-23

Anar Yeginbergen, Rodrigo Agerri

Crosslingual Argument Mining in the Medical Domain (2024)

Procesamiento del Lenguaje Natural, Nº. 73, págs. 296-312.

Rodrigo Agerri, Eneko Agirre, Gorka Azkune, Roberto Centeno, Anselmo Peñas, German Rigau, Álvaro Rodrigo, Aitor Soroa

DeepKnowledge: Deep Multilingual Language Model Technology for Language Understanding. (2024)

In SEPLN-CEDI-PD 2024: Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations, June 19-20, 2024, A Coruña, Spain.

Maxime Masson, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose, Rodrigo Agerri

TextBI: An Interactive Dashboard for Visualizing Multidimensional NLP Annotations in Social Media Data. (2024)

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024)

Olia Toporkov, Rodrigo Agerri

Evaluating Shortest Edit Script Methods for Contextual Lemmatization (2024)

In Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).

Oscar Sainz

Ikasketa-adibide urriko Informazio-Erauzketa (2024)

Sainz. O, (2024). Ikasketa-adibide urriko Informazio-Erauzketa [Doctoral thesis, The University of the Basque Country].

Itziar Gonzalez-Dios, Javier Alvez, and German Rigau

Exploiting Metonymy from Available Knowledge Resources. (2023)

20th International Conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, Revised Selected Papers, Part I. Lecture Notes in Computer Science book series (LNCS, volume 13451), pp 34-43

Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre

Image captioning for effective use of language models in knowledge-based visual question answering (2023)

Expert Systems with Applications, 2023, vol. 212, p. 118669. Preprint: https://arxiv.org/abs/2109.08029

Murali Kondragunta, Olatz Perez-de-Viñaspre, Maite Oronoz

Improving and Simplifying Template-Based Named Entity Recognition (2023)

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 79–86, Dubrovnik, Croatia. Association for Computational Linguistics. May 2023, Dubrovnik, Croatia.

Rodrigo Agerri, Eneko Agirre

Lessons learned from the evaluation of Spanish Language Models (2023)

Procesamiento del Lenguaje Natural (70), pp 157-170

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Scaling Laws for BERT in Low-Resource Settings (2023)

Findings of the Association for Computational Linguistics: ACL 2023

Nayla Escribano, German Rigau, Rodrigo Agerri

A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)

Nayla Escribano, German Rigau, Rodrigo Agerri, A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods, Knowledge-Based Systems, Volume 273, 2023, 110612, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2023.110612. (https://www.sciencedirect.com/science/article/pii/S0950705123003623) Abstract: Detecting and normalizing temporal expressions is an essential step for many NLP tasks. While a variety of methods have been proposed for detection, best normalization approaches rely on hand-crafted rules. Furthermore, most of them have been designed only for English. In this paper we present a modular multilingual temporal processing system combining a fine-tuned Masked Language Model for detection, and a grammar-based normalizer. We experiment in Spanish and English and compare with HeidelTime, the state-of-the-art in multilingual temporal processing. We obtain best results in gold timex normalization, timex detection and type recognition, and competitive performance in the combined TempEval-3 relaxed value metric. A detailed error analysis shows that detecting only those timexes for which it is feasible to provide a normalization is highly beneficial in this last metric. This raises the question of which is the best strategy for timex processing, namely, leaving undetected those timexes for which is not easy to provide normalization rules or aiming for high coverage. Keywords: Temporal processing; Multilingualism; Sequence labeling; Grammar-based approaches; Deep learning; Natural language processing

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, Dan Roth

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)

ACM Computing Surveys. 27 June 2023

Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison

Identifying Token-Level Dialectal Features in Social Media (2023)

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle

Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)

Procesamiento del Lenguaje Natural. v. 71, p. 165-177, sep. 2023

Iker García, Rodrigo Agerri, German Rigau

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Iñigo Alonso, Eneko Agirre

Automatic Logical Forms improve fidelity in Table-to-Text generation (2023)

Expert Systems with Applications, Volume 238, Part D, 15 March 2024, 121869 https://arxiv.org/abs/2310.17279

Begoña Altuna, Rodrigo Agerri, Lidia Salas-Espejo, José Javier Saiz, Roberto Zanoli, Manuela Speranza, Bernardo Magnini, Alberto Lavelli, Goutham Karunakaran

Overview of TESTLINK at IberLEF 2023: Linking Results to Clinical Laboratory Tests and Measurements (2023)

Procesamiento del Lenguaje Natural, Revista nº 71, 313-320, septiembre de 2023.

Begoña Altuna, Goutham Karunakaran, Alberto Lavelli, Bernardo Magnini, Manuela Speranza, Roberto Zanoli

CLinkaRT at EVALITA 2023: Overview of the Task on Linking a Lab Result to its Test Event in the Clinical Domain (2023)

Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Parma 2023.

Roberto Centeno, Rodrigo Agerri

Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)

Roberto Centeno and Rodrigo Agerri (2023). Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation. In Proceedings of the Workshop on NLP applied to Misinformation, co-located with the 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023).

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)

Rodrigo Agerri, Iñigo Alonso, Aitziber Atutxa, Ander Berrondo, Ainara Estarrona, Iker Garcia-Ferrero, Iakes Goenaga, Koldo Gojenola, Maite Oronoz, Igor Perez-Tejedor, German Rigau and Anar Yeginbergenova (2023). HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine. In SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processing.

Joseba Fernandez de Landa, Rodrigo Agerri

HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)

Joseba Fernandez de Landa, Rodrigo Agerri (2023). HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, September 2023.

Maddalen Lopez de Lacalle

Predicate Matrix: an interoperable lexical knowledge base for predicates (2023)

Arantxa Otegi, Iñaki San Vicente, Xabier Saralegi, Anselmo Peñas, Borja Lozano, Eneko Agirre

Information retrieval and question answering: A case study on COVID-19 scientific literature (2022)

Knowledge-Based Systems, Volume 240.

Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre

Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning (2022)

In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, Washington. Association for Computational Linguistics.

Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, Bonan Min

ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations (2022)

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, Seattle, Washington. Association for Computational Linguistics.

Eneko Agirre

Few-shot Information Extraction is Here: Pre-train, Prompt and Entail (2022)

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

E Agirre, M Apidianaki, I Vulić

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2022)

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, Dublin, Ireland

David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal

Direct Parsing to Sentiment Graphs (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)

Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.

Iker Garcia-Ferrero, Rodrigo Agerri, German Rigau

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)

Findings of the Association for Computational Linguistics: EMNLP 2022

Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

SemEval 2022 Task 10: Structured Sentiment Analysis (2022)

In SemEval 2022

Blanca Calvo Figueras, Montse Cuadros, Rodrigo Agerri

A Semantics-Aware Approach to Automated Claim Verification (2022)

In Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER), pages 37–48, Dublin, Ireland. Association for Computational Linguistics

Cristina Aceta, Johan Kildal, Izaskun Fernández, Aitor Soroa

Towards an optimal design of natural human interaction mechanisms for a service robot with ancillary way-finding capabilities in industrial environments (2021)

Production & Manufacturing Research, 9:1, 1-32

Ainhoa Serna, Aitor Soroa, Rodrigo Agerri

Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport (2021)

Sustainability 13, no. 4: 2397.

Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre

Inferring spatial relations from textual descriptions of images (2021)

Pattern Recognition, Volume 113, 107847. Pre-print: https://arxiv.org/abs/2102.00997

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)

Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2021)

In conjunction with NAACL. Association for Computational Linguistics

Elena Zotova, Rodrigo Agerri, German Rigau

Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)

Expert Systems with Applications, 170 (2021).

Joseba Fernandez de Landa, Rodrigo Agerri

Euskarazko on-line artikuluetan aipatutako izendun entitate nabarmenen identifikazioa denbora errealean (2021)

Ekaia

Jon Alkorta

Hacia el análisis de sentimientos en euskera (2021)

J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.

Joseba Fernandez de Landa, Iker García, Ander Salaberria, Jon Ander Campos

Twitterreko Euskal Komunitatearen Eduki Azterketa Pandemia Garaian (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Ander Barrena, Aitor Soroa, Eneko Agirre

Towards Zero-Shot Cross-Lingual Named Entity Disambiguation (2021)

Expert Systems With Applications ESWA 2021

Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre

Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction (2021)

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Rodrigo Agerri, Roberto Centeno, María Espinosa, Joseba Fernández de Landa, Álvaro Rodrigo

VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)

Procesamiento del Lenguaje Natural, 67, pp 173-181

Iker García-Ferrero, Rodrigo Agerri, German Rigau

Benchmarking Meta-embeddings: What Works and What Does Not (2021)

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021

Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Multilingual Counter Narrative Type Classification (2021)

Proceedings of Argument Mining 2021

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project: European Clinical Case Corpus (2021)

Proceedings of the Annual Conference of the Spanish Association for Natural Language Processing: Projects and Demonstrations (SEPLN-PD 2021). Pages 17-20. ISSN: 1613-0073. URL: http://ceur-ws.org/Vol-2968/paper5.pdf

Eneko Agirre

Cross-Lingual Word Embeddings (Book Review) (2020)

Computational Linguistics 46 (1), 245-248. (https://doi.org/10.1162/COLI_r_00372)

Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune and Eneko Agirre

Evaluating Multimodal Representations on Visual Semantic Textual Similarity (2020)

Proceedings of the Twenty-third European Conference on Artificial Intelligence, ECAI 2020, June 8-12, 2020, Santiago Compostela, Spain

Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar

Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction (2020)

Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2020)

International Joint Conference on Artificial Intelligence (IJCAI 2020)

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning (2020)file2 (2020)

Frontiers in Artificial Intelligence and Applications. Giuseppe De Giacomo, Alejandro Catala, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, Jérôme Lang (eds.). Volume 325: ECAI 2020. Pages 585 - 592. IOS Press Ebooks

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana Garcia-Serrano, Mohamed Ben Aouicha, Eneko Agirre, David Sánchez

A large reproducible benchmark of ontology-based methods and word embeddings for word similarity (2020)

Information Systems. Online first.

Iker de la Iglesia, Mikel Martinez-Puente, Alexander Platas, Iria San Miguel, Aitziber Atutxa, Koldo Gojenola

MEDIA team at the CLEF-2020 MultilingualInformation Extraction Task (2020)

Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum Thessaloniki, Greece, September 22-25, 2020.

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)

Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2020)

In conjunction with EMNLP. Association for Computational Linguistics

Rodrigo Agerri, German Rigau

Projecting Heterogeneous Annotations for Named Entity Recognition (2020)

In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020). Winner of the

CAPITEL@IberLEF

task on Spanish NER.

María Espinosa, Rodrigo Agerri, Roberto Centeno, Alvaro Rodrigo

DeepReading@SardiStance:Combining Textual, Social and Emotional Features. (2020)

Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). Winners of the

SardiStance@Evalita

2020 shared task

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2019)

Artificial Intelligence, 268 (2019) 85-95

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Word n-gram attention models for sentence similarity and inference (2019)

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)

Data in Brief, Volume 26.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)

Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665.

Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.

Rodrigo Agerri

Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)

SemEval@NAACL-HLT

2019: 944-948 https://www.aclweb.org/anthology/S19-2161.pdf

Joseba Fernandez de Landa, Rodrigo Agerri, Iñaki Alegria

Euskaldun gazte eta helduen harremanak Twitterren (2019)

III. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Gizarte Zientziak eta Zuzenbidea. 2, pp. 83 - 90

Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau

Automatic white-box testing of first-order logic ontologies (2019)

Journal of Logic and Computation, Volume 29, Issue 5, September 2019, Pages 723–751

Alvez,J; Lucio,P; Rigau,G

A Framework for the Evaluation of SUMO-Based Ontologies Using WordNet (2019)

IEEE Access, 7, 36075-36093. 2019

Mark Stevenson, Eneko Agirre

Word Sense Disambiguation (2018)

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Josu Goikoetxea, Aitor Soroa eta Eneko Agirre

Knowledge-Based Systems (KNOSYS). Volume 150, 15 June 2018, Pages 218-230. ISSN: 0950-7051. DOI https://doi.org/10.1016/j.knosys.2018.03.017 Preprint at https://arxiv.org/pdf/1804.08316.pdf

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Ander Barrena, Aitor Soroa, Eneko Agirre

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

All HiTZ publications

ie_ir_tabs_full

Demo of the NewsReader NLP pipeline

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. - TECNALIA
(2024 - 2028)
Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. MULTIVERSE.
(2025 - 2028)
Provision of services by the CHAIR in the field of Artificial Intelligence and Language Technology. ELHUYAR.
(2025 - 2028)
Asesoría científica en el diseño y construcción de sistemas de extracción de la información a partir de textos no estructurados

(2025 - 2026)
(2024 - 2025)
Asesoría científica en el diseño y construcción de sistemas de extracción de la información a partir de textos no estructurados

(2023 - 2024)
Adimen artifizial sortzailea web mintegia (webinar).
Online course for Gipuzkoa Provincial Council employees
(2024 - 2024)
Data Privacy in Artificial Intelligence for Health Applications: A QA system to extract specific information from medical reports that can be used for better decision making
(2020 - 2021)
Pre-training cross-lingual language models
(2020 - 2020)
(2019 - 2020)

All HiTZ projects.

ECHOLOT — European Cultural Heritage Optimised Linked Open Tools

(2026 - 2028)
Humanizing AI with language technology (HumanAIze)
(2025 - 2028)
Grant DeepThought (PID2024-159202OB-C21) funded by MICIU/AEI /10.13039/501100011033 and by ERDF, EU
(2025 - 2028)
Project CRITICS (PCI2025-167239-2) funded by MICIU/AEI /10.13039/501100011033 and co-funded by the European Union
(2025 - 2028)
DeepKnowledge (PID2021-127777OB-C21) project funded by MCIN/AEI/10.13039/501100011033 and by ERDF, EU
(2022 - 2026)
Project CNS2023-144375 funded by MTDFP/ and by European Union Next GenerationEU/ PRTR.
(2024 - 2026)
The HiTZ Chair of Artificial Intelligence and Language Technology has an ambitious program to strengthen leadership in this technology and place the country at the technological forefront.
(2024 - 2026)
ICL4LANG: Aprendizaje En contexto como nuevo paradigma para investigar tecnologías del lenguaje escalables y de alta precisión adaptadas a las necesidades industriales del País Vasco

(2023 - 2025)
CLARIAH-EUS-gArA

(2024 - 2025)
Disargue (TED2021-130810B-C21) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/ PRTR
(2022 - 2025)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2024 - 2025)
Antidote (PCI2020-120717-2) funded by MCIN/AEI /10.13039/501100011033 and by European Union NextGenerationEU/PRTR
(2021 - 2024)
DeepR3 (TED2021-130295B-C31) founded by MCIN/AEI/10.13039/501100011033 and European Union NextGeneration EU/PRTR.
(2022 - 2024)
Use of computational resources in the EuroHPC SuperComputer to scale up the experiments and build very large models for European languages with few resources
(2023 - 2024)
Better Extraction from Text Towards Enhanced Retrieval
(2019 - 2023)
Tools for the analysis of parliamentary discourses: polarization, subjectivity and affectivity in the post-truth era
(2020 - 2022)
DeepReading: Mining, Understanding, and Reasoning with Multilingual Content.
(2019 - 2021)
Deep learning, Big Data and knowledge for multilingual text processing.
(2019 - 2021)
New generation of neural artificial intelligence models to transform language technologies in the Basque Country's industry.
(2020 - 2021)
Automated surveillance of key questions on COVID-19 in scientific publications
(2020 - 2021)
Learning to Interact with Humans by Lifelong Interaction with Humans
(2017 - 2020)
CROSSTEXT: Automatic Generation of Multilingual Semantic Processors
Automatic generation of multilingual semantic taggers
(2017 - 2019)
TUNER: Automatic domain adaptation for semantic processing.
(2016 - 2018)
MUSTER: Multimodal processing of Spatial and TEmporal expRessions: Toward Understanding Space and Time in Language Enhanced by Vision.
Multimodal processing of Spatial and TEmporal expRessions: Toward Understanding Space and Time in Language Enhanced by Vision.
(2016 - 2018)
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
Openminted: Sharing IXA pipes in the OpenMinTeD platform.
(2018 - 2018)

All HiTZ projects

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

EIEC
Basque Named Entity Recognition corpus.
EDIEC
Basque corpus annotated for Named Entity Disambiguation.
MCR: Multilingual Central Repository
Multilingual lexical database with wordnets for several European languages, including Basque.
EPEC-EuSemcor
Corpus tagged with Basque WordNet senses.

Iker García-Ferrero

Cross-Lingual Transfer for Low-Resource Natural Language Processing (2025)file2 (2025)

Iñigo Alonso

Improving Fidelity and Table Representation in Table Understanding and Table-to-Text Generation (2025)file2 (2025)

Neil De La Fuente, Oscar Sainz, Iker García-Ferrero, Eneko Agirre

GUIDEX: Guided Synthetic Data Generation for Zero-Shot Information Extraction (2025)

Findings of the Association for Computational Linguistics: ACL 2025

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models (2025)

Maitane Urruela, Sergio Martín, Iker De la Iglesia, Ander Barrena

Medical Argument Mining: Exploitation of Scarce Data Using NLI Systems (2025)

Vol. 75 (2025): Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025

Adrian Cuadron Cortes, Aimar Sagasti, Maitane Urruela, Iker De La Iglesia, Ane García Domingo-aldama, Aitziber Atutxa Salazar, Josu Goikoetxea, Ander Barrena

ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality (2025)

Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks), pages 1–10, Vienna, Austria. Association for Computational Linguistics.

Elisa Sanchez-Bayona, Rodrigo Agerri

Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding (2025)

In Findings of the Association for Computational Linguistics: ACL 2025, pages 17462–17477, Vienna, Austria. Association for Computational Linguistics.

Olia Toporkov, Alan Akbik, Rodrigo Agerri

Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data (2025)

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18219–18232, Suzhou, China. Association for Computational Linguistics.

Masson, Maxime, Rodrigo Agerri, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, and Philippe Roose

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain (2025)

Knowledge-Based Systems (2025): 114001 (Elsevier).

Iker De la Iglesia, Adrián Sánchez-Freire, Oier Urquijo-Durán, Ander Barrena, Aitziber Atutxa

EriBERTa Private Surpasses her Public Alter Ego: Enhancing a Bilingual Pretrained Encoder with Limited Private Medical Data (2025)

Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025, p. 283-296

Mar Rodríguez, Olatz Perez-de-Viñaspre, Naiara Perez

A Two-Stage Multilingual Job Title Matching System: Combining Expert Knowledge and LLM-based Ranking (2025)

Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2025), Vol-4038, pp. 4479-4493

Olia Toporkov, Rodrigo Agerri

On the Role of Morphological Information for Contextual Lemmatization (2024)

Computational Linguistics (MIT Press).

Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction (2024)

The Twelfth International Conference on Learning Representations

Mikel Zubillaga, Oscar Sainz, Ainara Estarrona, Oier Lopez de Lacalle, Eneko Agirre

Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis (2024)

Proceeding of The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, Turin, Italy

IKER-GAITU: research on language technology for Basque and other low-resource languages (2024)

Eneko Agirre, Olatz Arbelaitz, Olatz Arregi, Gorka Azkune, Arantza Casillas, Inma Hernaez, Mikel Iruskieta, Elena Lazkano, Eva Navas, German Rigau, Roberto Santana, Aitor Soroa and Rabih Zbib

ENIA Chair in Artificial Intelligence and Language Technology (2024)

Giulia Pensa, Begoña Altuna, and Itziar Gonzalez-Dios.

A Multi-layered Approach to Physical Commonsense Understanding: Creation and Evaluation of an Italian Dataset (2024)

Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024). August 11th to 16th, 2024. Bangkok, Thailand

Ahmed Elhady, Khaled Elsayed, Eneko Agirre, and Mikel Artetxe

Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training (2024)

Iñigo Alonso, Eneko Agirre, Mirella Lapata

PixT3: Pixel-based Table To Text generation (2024)

Proceedings of the 2024 Main Conference of the Association for Computational Linguistics (ACL 2024)

Aitor García-Pablos, Naiara Perez, Montse Cuadros, Jaione Bengoetxea

EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque (2024)

Procesamiento del Lenguaje Natural, Revista no 73, septiembre de 2024, pp. 125-137

Iakes Goenaga, Aitziber Atutxa, Koldo Gojenola, Maite Oronoz, Rodrigo Agerri

Explanatory argument extraction of correct answers in resident medical exams (2024)

Artificial Intelligence in Medicine Volume 157, November 2024, 102985

RENDIMIENTO DE LAS EXPRESIONES REGULARES EN EL ANÁLISIS DE INFORMES DE ALTA PRESENTES EN LA HISTORIA CLÍNICA ELECTRÓNICA: EXPRIMIENDO LOS DATOS SECUNDARIOS (2024)

Revista Española de Cardiología. Rev Esp Cardiol. 2024;77 (Supl 1): 33

RENDIMIENTO DE SISTEMAS DE CHAT ALIMENTADOS CON ARTÍCULOS DE INVESTIGACIÓN EN UN ENTORNO CLÍNICO ESPECÍFICO: LA ENFERMEDAD VALVULAR CARDIACA (2024)

Revista Española de Cardiología. Rev Esp Cardiol. 2024;77 (Supl 1): 1161

Iñigo Alonso, Eneko Agirre, Mirella Lapata

PixT3: Pixel-based Table-To-Text Generation (2024)

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) https://aclanthology.org/2024.acl-long.364

Maxime Masson, Philippe Roose, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Rodrigo Agerri

ProxMetrics: modular proxemic similarity toolkit to generate domain-adaptable indicators from social media (2024)

Social Network Analysis and Mining, 14(1), pp.1-23

Anar Yeginbergen, Rodrigo Agerri

Crosslingual Argument Mining in the Medical Domain (2024)

Procesamiento del Lenguaje Natural, Nº. 73, págs. 296-312.

Rodrigo Agerri, Eneko Agirre, Gorka Azkune, Roberto Centeno, Anselmo Peñas, German Rigau, Álvaro Rodrigo, Aitor Soroa

DeepKnowledge: Deep Multilingual Language Model Technology for Language Understanding. (2024)

In SEPLN-CEDI-PD 2024: Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations, June 19-20, 2024, A Coruña, Spain.

Maxime Masson, Christian Sallaberry, Marie-Noelle Bessagnet, Annig Le Parc Lacayrelle, Philippe Roose, Rodrigo Agerri

TextBI: An Interactive Dashboard for Visualizing Multidimensional NLP Annotations in Social Media Data. (2024)

In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024)

Olia Toporkov, Rodrigo Agerri

Evaluating Shortest Edit Script Methods for Contextual Lemmatization (2024)

In Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).

Oscar Sainz

Ikasketa-adibide urriko Informazio-Erauzketa (2024)

Sainz. O, (2024). Ikasketa-adibide urriko Informazio-Erauzketa [Doctoral thesis, The University of the Basque Country].

Itziar Gonzalez-Dios, Javier Alvez, and German Rigau

Exploiting Metonymy from Available Knowledge Resources. (2023)

20th International Conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, Revised Selected Papers, Part I. Lecture Notes in Computer Science book series (LNCS, volume 13451), pp 34-43

Ander Salaberria, Gorka Azkune, Oier Lopez de Lacalle, Aitor Soroa, Eneko Agirre

Image captioning for effective use of language models in knowledge-based visual question answering (2023)

Expert Systems with Applications, 2023, vol. 212, p. 118669. Preprint: https://arxiv.org/abs/2109.08029

Murali Kondragunta, Olatz Perez-de-Viñaspre, Maite Oronoz

Improving and Simplifying Template-Based Named Entity Recognition (2023)

Rodrigo Agerri, Eneko Agirre

Lessons learned from the evaluation of Spanish Language Models (2023)

Procesamiento del Lenguaje Natural (70), pp 157-170

Gorka Urbizu, Iñaki San Vicente, Xabier Saralegi, Rodrigo Agerri, Aitor Soroa

Scaling Laws for BERT in Low-Resource Settings (2023)

Findings of the Association for Computational Linguistics: ACL 2023

Nayla Escribano, German Rigau, Rodrigo Agerri

A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, Dan Roth

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)

ACM Computing Surveys. 27 June 2023

Jeremy Barnes, Samia Touileb, Petter Mæhlum, Pierre Lison

Identifying Token-Level Dialectal Features in Social Media (2023)

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

Irene Baucells de la Peña, Blanca Calvo Figueras, Marta Villegas, Oier Lopez de Lacalle

Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)

Procesamiento del Lenguaje Natural. v. 71, p. 165-177, sep. 2023

Iker García, Rodrigo Agerri, German Rigau

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Findings of the Association for Computational Linguistics: EMNLP 2023

Iñigo Alonso, Eneko Agirre

Automatic Logical Forms improve fidelity in Table-to-Text generation (2023)

Expert Systems with Applications, Volume 238, Part D, 15 March 2024, 121869 https://arxiv.org/abs/2310.17279

Begoña Altuna, Rodrigo Agerri, Lidia Salas-Espejo, José Javier Saiz, Roberto Zanoli, Manuela Speranza, Bernardo Magnini, Alberto Lavelli, Goutham Karunakaran

Overview of TESTLINK at IberLEF 2023: Linking Results to Clinical Laboratory Tests and Measurements (2023)

Procesamiento del Lenguaje Natural, Revista nº 71, 313-320, septiembre de 2023.

Begoña Altuna, Goutham Karunakaran, Alberto Lavelli, Bernardo Magnini, Manuela Speranza, Roberto Zanoli

CLinkaRT at EVALITA 2023: Overview of the Task on Linking a Lab Result to its Test Event in the Clinical Domain (2023)

Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023), Parma 2023.

Roberto Centeno, Rodrigo Agerri

Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)

Joseba Fernandez de Landa, Rodrigo Agerri

HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)

Maddalen Lopez de Lacalle

Predicate Matrix: an interoperable lexical knowledge base for predicates (2023)

Arantxa Otegi, Iñaki San Vicente, Xabier Saralegi, Anselmo Peñas, Borja Lozano, Eneko Agirre

Information retrieval and question answering: A case study on COVID-19 scientific literature (2022)

Knowledge-Based Systems, Volume 240.

Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, Eneko Agirre

Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning (2022)

In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, Washington. Association for Computational Linguistics.

Oscar Sainz, Haoling Qiu, Oier Lopez de Lacalle, Eneko Agirre, Bonan Min

ZS4IE: A toolkit for Zero-Shot Information Extraction with simple Verbalizations (2022)

Eneko Agirre

Few-shot Information Extraction is Here: Pre-train, Prompt and Entail (2022)

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

E Agirre, M Apidianaki, I Vulić

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2022)

David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, and Erik Velldal

Direct Parsing to Sentiment Graphs (2022)

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages: 470–478

Nayla Escribano, Jon Ander González, Julen Orbegozo-Terradillos, Ainara Larrondo-Ureta, Simón Peña-Fernández, Olatz Perez-de-Viñaspre, Rodrigo Agerri

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)

Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3382–3390, Marseille, France. European Language Resources Association.

Iker Garcia-Ferrero, Rodrigo Agerri, German Rigau

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)

Findings of the Association for Computational Linguistics: EMNLP 2022

Jeremy Barnes, Laura Oberlaender, Enrica Troiano, Andrey Kutuzov, Jan Buchmann, Rodrigo Agerri, Lilja Øvrelid, Erik Velldal

SemEval 2022 Task 10: Structured Sentiment Analysis (2022)

In SemEval 2022

Blanca Calvo Figueras, Montse Cuadros, Rodrigo Agerri

A Semantics-Aware Approach to Automated Claim Verification (2022)

In Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER), pages 37–48, Dublin, Ireland. Association for Computational Linguistics

Cristina Aceta, Johan Kildal, Izaskun Fernández, Aitor Soroa

Towards an optimal design of natural human interaction mechanisms for a service robot with ancillary way-finding capabilities in industrial environments (2021)

Production & Manufacturing Research, 9:1, 1-32

Ainhoa Serna, Aitor Soroa, Rodrigo Agerri

Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport (2021)

Sustainability 13, no. 4: 2397.

Aitzol Elu, Gorka Azkune, Oier Lopez de Lacalle, Ignacio Arganda-Carreras, Aitor Soroa, Eneko Agirre

Inferring spatial relations from textual descriptions of images (2021)

Pattern Recognition, Volume 113, 107847. Pre-print: https://arxiv.org/abs/2102.00997

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)

Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2021)

In conjunction with NAACL. Association for Computational Linguistics

Elena Zotova, Rodrigo Agerri, German Rigau

Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)

Expert Systems with Applications, 170 (2021).

Joseba Fernandez de Landa, Rodrigo Agerri

Euskarazko on-line artikuluetan aipatutako izendun entitate nabarmenen identifikazioa denbora errealean (2021)

Ekaia

Jon Alkorta

Hacia el análisis de sentimientos en euskera (2021)

J. Alkorta. (2021). Hacia el análisis de sentimientos en euskera. Procesamiento del Lenguaje Natural, 66, 201-204.

Joseba Fernandez de Landa, Iker García, Ander Salaberria, Jon Ander Campos

Twitterreko Euskal Komunitatearen Eduki Azterketa Pandemia Garaian (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Ander Barrena, Aitor Soroa, Eneko Agirre

Towards Zero-Shot Cross-Lingual Named Entity Disambiguation (2021)

Expert Systems With Applications ESWA 2021

Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, Eneko Agirre

Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction (2021)

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Rodrigo Agerri, Roberto Centeno, María Espinosa, Joseba Fernández de Landa, Álvaro Rodrigo

VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)

Procesamiento del Lenguaje Natural, 67, pp 173-181

Iker García-Ferrero, Rodrigo Agerri, German Rigau

Benchmarking Meta-embeddings: What Works and What Does Not (2021)

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021

Yi-Ling Chung, Marco Guerini, Rodrigo Agerri

Multilingual Counter Narrative Type Classification (2021)

Proceedings of Argument Mining 2021

Bernardo Magnini, Begoña Altuna, Alberto Lavelli, Manuela Speranza, Roberto Zanoli

The E3C Project: European Clinical Case Corpus (2021)

Eneko Agirre

Cross-Lingual Word Embeddings (Book Review) (2020)

Computational Linguistics 46 (1), 245-248. (https://doi.org/10.1162/COLI_r_00372)

Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune and Eneko Agirre

Evaluating Multimodal Representations on Visual Semantic Textual Similarity (2020)

Proceedings of the Twenty-third European Conference on Artificial Intelligence, ECAI 2020, June 8-12, 2020, Santiago Compostela, Spain

Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar

Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction (2020)

Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2020)

International Joint Conference on Artificial Intelligence (IJCAI 2020)

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Applying the Closed World Assumption to SUMO-based FOL Ontologies for Effective Commonsense Reasoning (2020)file2 (2020)

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana Garcia-Serrano, Mohamed Ben Aouicha, Eneko Agirre, David Sánchez

A large reproducible benchmark of ontology-based methods and word embeddings for word similarity (2020)

Information Systems. Online first.

Iker de la Iglesia, Mikel Martinez-Puente, Alexander Platas, Iria San Miguel, Aitziber Atutxa, Koldo Gojenola

MEDIA team at the CLEF-2020 MultilingualInformation Extraction Task (2020)

Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum Thessaloniki, Greece, September 22-25, 2020.

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)

Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2020)

In conjunction with EMNLP. Association for Computational Linguistics

Rodrigo Agerri, German Rigau

Projecting Heterogeneous Annotations for Named Entity Recognition (2020)

In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020). Winner of the

CAPITEL@IberLEF

task on Spanish NER.

María Espinosa, Rodrigo Agerri, Roberto Centeno, Alvaro Rodrigo

DeepReading@SardiStance:Combining Textual, Social and Emotional Features. (2020)

Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). Winners of the

SardiStance@Evalita

2020 shared task

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2019)

Artificial Intelligence, 268 (2019) 85-95

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Word n-gram attention models for sentence similarity and inference (2019)

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)

Data in Brief, Volume 26.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)

Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665.

Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.

Rodrigo Agerri

Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)

SemEval@NAACL-HLT

2019: 944-948 https://www.aclweb.org/anthology/S19-2161.pdf

Joseba Fernandez de Landa, Rodrigo Agerri, Iñaki Alegria

Euskaldun gazte eta helduen harremanak Twitterren (2019)

III. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Gizarte Zientziak eta Zuzenbidea. 2, pp. 83 - 90

Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau

Automatic white-box testing of first-order logic ontologies (2019)

Journal of Logic and Computation, Volume 29, Issue 5, September 2019, Pages 723–751

Alvez,J; Lucio,P; Rigau,G

A Framework for the Evaluation of SUMO-Based Ontologies Using WordNet (2019)

IEEE Access, 7, 36075-36093. 2019

Mark Stevenson, Eneko Agirre

Word Sense Disambiguation (2018)

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Josu Goikoetxea, Aitor Soroa eta Eneko Agirre

Knowledge-Based Systems (KNOSYS). Volume 150, 15 June 2018, Pages 218-230. ISSN: 0950-7051. DOI https://doi.org/10.1016/j.knosys.2018.03.017 Preprint at https://arxiv.org/pdf/1804.08316.pdf

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Ander Barrena, Aitor Soroa, Eneko Agirre

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

All HiTZ publications

Languages

You are here

Informazioaren Erauzketa eta Berreskurapena

ie_ir_tabs

EUSLEM

UKB

KYBOT

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models (2025)

Medical Argument Mining: Exploitation of Scarce Data Using NLI Systems (2025)

ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality (2025)

Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding (2025)

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain (2025)

EriBERTa Private Surpasses her Public Alter Ego: Enhancing a Bilingual Pretrained Encoder with Limited Private Medical Data (2025)

A Two-Stage Multilingual Job Title Matching System: Combining Expert Knowledge and LLM-based Ranking (2025)

On the Role of Morphological Information for Contextual Lemmatization (2024)

Improving Factuality in Clinical Abstractive Multi-Document Summarization by Guided Continued Pre-training (2024)

PixT3: Pixel-based Table To Text generation (2024)

Explanatory argument extraction of correct answers in resident medical exams (2024)

RENDIMIENTO DE LAS EXPRESIONES REGULARES EN EL ANÁLISIS DE INFORMES DE ALTA PRESENTES EN LA HISTORIA CLÍNICA ELECTRÓNICA: EXPRIMIENDO LOS DATOS SECUNDARIOS (2024)

RENDIMIENTO DE SISTEMAS DE CHAT ALIMENTADOS CON ARTÍCULOS DE INVESTIGACIÓN EN UN ENTORNO CLÍNICO ESPECÍFICO: LA ENFERMEDAD VALVULAR CARDIACA (2024)

ProxMetrics: modular proxemic similarity toolkit to generate domain-adaptable indicators from social media (2024)

Crosslingual Argument Mining in the Medical Domain (2024)

DeepKnowledge: Deep Multilingual Language Model Technology for Language Understanding. (2024)

TextBI: An Interactive Dashboard for Visualizing Multidimensional NLP Annotations in Social Media Data. (2024)

Evaluating Shortest Edit Script Methods for Contextual Lemmatization (2024)

Image captioning for effective use of language models in knowledge-based visual question answering (2023)

Improving and Simplifying Template-Based Named Entity Recognition (2023)

Lessons learned from the evaluation of Spanish Language Models (2023)

Scaling Laws for BERT in Low-Resource Settings (2023)

A modular approach for multilingual timex detection and normalization using deep learning and grammar-based methods (2023)

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey (2023)

Identifying Token-Level Dialectal Features in Social Media (2023)

Entailment-based Task Transfer for Catalan Text Classification in Small Data Regimes (2023)

T-Projection: High Quality Annotation Projection for Sequence Labeling Tasks (2023)

Overview of NLP-MisInfo 2023: Workshop on NLP applied to Misinformation (2023)

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine (2023)

HiTZ-IXA at PoliticES 2023: Document and Sentence Level Text Representations for Demographic Characteristics and Political Ideology Detection. (2023)

Predicate Matrix: an interoperable lexical knowledge base for predicates (2023)

Information retrieval and question answering: A case study on COVID-19 scientific literature (2022)

Few-shot Information Extraction is Here: Pre-train, Prompt and Entail (2022)

Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2022)

BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions (2022)

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings (2022)

SemEval 2022 Task 10: Structured Sentiment Analysis (2022)

A Semantics-Aware Approach to Automated Claim Verification (2022)

Towards an optimal design of natural human interaction mechanisms for a service robot with ancillary way-finding capabilities in industrial environments (2021)

Applying Deep Learning Techniques for Sentiment Analysis to Assess Sustainable Transport (2021)

Proceedings of Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2021)

Semi-automatic generation of multilingual datasets for stance detection in Twitter (2021)

Euskarazko on-line artikuluetan aipatutako izendun entitate nabarmenen identifikazioa denbora errealean (2021)

Towards Zero-Shot Cross-Lingual Named Entity Disambiguation (2021)

VaxxStance@IberLEF 2021: Overview of the Task on Going Beyond Text in Cross-Lingual Stance Detection (2021)

Multilingual Counter Narrative Type Classification (2021)

Cross-Lingual Word Embeddings (Book Review) (2020)

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures (2020)

Projecting Heterogeneous Annotations for Named Entity Recognition (2020)

DeepReading@SardiStance:Combining Textual, Social and Emotional Features. (2020)

Word n-gram attention models for sentence similarity and inference (2019)

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)

Euskaldun gazte eta helduen harremanak Twitterren (2019)

Automatic white-box testing of first-order logic ontologies (2019)

A Framework for the Evaluation of SUMO-Based Ontologies Using WordNet (2019)

Word Sense Disambiguation (2018)

ie_ir_tabs_full

EUSLEM

UKB

KYBOT

Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models (2025)

Medical Argument Mining: Exploitation of Scarce Data Using NLI Systems (2025)

ArgHiTZ at ArchEHR-QA 2025: A Two-Step Divide and Conquer Approach to Patient Question Answering for Top Factuality (2025)

Metaphor and Large Language Models: When Surface Features Matter More than Deep Understanding (2025)

Optimal strategies to perform multilingual analysis of social content for a novel dataset in the tourism domain (2025)

EriBERTa Private Surpasses her Public Alter Ego: Enhancing a Bilingual Pretrained Encoder with Limited Private Medical Data (2025)

A Two-Stage Multilingual Job Title Matching System: Combining Expert Knowledge and LLM-based Ranking (2025)

On the Role of Morphological Information for Contextual Lemmatization (2024)