research projects

Grant DeepThought (PID2024-159202OB-C21) funded by MICIU/AEI /10.13039/501100011033 and by ERDF, EU
(2025 - 2028)
DeepThought introduces a scalable approach for adapting Large Language Models to low-resource languages through an innovative joint pretraining and alignment strategy. The project will develop empirically validated methodologies for extending open-source models like Llama-3+ and Qwen+ to languages such as Basque and Spanish, leveraging synthetic datasets and reasoning verbalizations from existing LLMs and Language Reasoning Models. Key objectives include improving zero-shot and few-shot performance through RAG techniques, enhancing test-time computation and reasoning capabilities, creating new evaluation benchmarks focusing on truthfulness and safety, developing LLM-as-a-Judge metrics, and building multimodal applications across domains like eLearning and eHealth. While initially focused on Spanish and Basque, the project aims to democratize LLM technology for Europe's low-resource languages, ensuring these communities can fully participate in AI advances.
Webpage: http://deepthought.hitz.eus
Organization: Ministerio de Ciencia, Innovación y Universidades (MCIU)
Main researcher: Rodrigo Agerri, German Rigau
Participants:
Jon Ander Elorriaga, Mikel Larrañaga, Xabier Saralegi, Muitze Zulaika, Rodrigo Agerri, Itziar Aldabe, Izaskun Aldezabal, Olatz Ansa, Maxux Aranzabe, Xabier Arregi, Olatz Arregi, Jeremy Barnes, Blanca Calvo, Julen Etxaniz, Izaskun Etxeberria, Itziar Gonzalez-Dios, Maite Heredia, Mikel Iruskieta, Iñigo López, German Rigau , Elisa Sanchez, Aitor Soroa, Anar Yeginbergen, Irune Zubiaga


