Traducción Automática
Comenzamos a investigar en Traducción Automática en el año 2000 y siguiendo los paradigmas que se han ido desarrollando en el área: primero basado en reglas (RBMT), luego estadístico (SMT) y actualmente neuronal (NMT). Nos hemos centrado principalmente en la traducción desde y hacia el euskera, ya que, además de su interés comercial en nuestro país, es un reto importante por varias razones: la complejidad de la morfología vasca, el orden libre de los componentes de las oraciones, y la escasez...Leer Más
Demos
Demos
Modela (2018)
Neural MT for Basque (with other partners, from Modela project)
NMT itzultzailea (2018)
Own NMT for Basque (from TADEEP project)
Matxin
Machine translation from Spanish to Basque
Contratos
Patentes
SignON - Sign Language Translation Mobile Application and Open Communications Framework
(2021 - 2023)
Trustworthy AI - Integrating Learning, Optimisation and Reasoning
(2020 - 2023)
European Language Equality
(2021 - 2022)
DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
(2019 - 2021)
red estratégica para la promoción de las infraestructuras de tecnologías del lenguaje en ehumanidades y ciencias sociales
(2020 - 2021)
Building Neuronal Mcahine Translation methods and systems to improve coherence at paragraph and document level
(2020 - 2021)- UnsupNMT: Traducción Automática Neuronal no Supervisada: un nuevo paradigma basado solo en textos monolingües.
UnsupNMT: Unsupervised Neuronal Machine Translation: a new paradigm based only on monolingual text
(2018 - 2020)
MODENA: Advanced neural modeling for high-quality translation.
(2018 - 2019)- MultiNMT: Traducción automática neuronal mulltidireccional orientada al cliente.
(2019 - 2019)
TADEEP: Deep Machine Translation
(2016 - 2018)
Ixa Group. 'A' level research group (Basque Government)
(2016 - 2018)
MODELA: Statistical Modeling and Deep Learning for High Quality Machine Translation
(2016 - 2017)
QTLeap: Quality Translation by Deep Language Engineering Approaches
(2013 - 2016)
Patentes
Publicaciones
Uxoa Iñurrieta
Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)
Procesamiento del Lenguaje Natural, 64, pp. 123-126.
Nora Aranberri
Can translationese features help users select an MT system for post-editing? (2020)
Revista Procesamiento del Lenguaje Natural, 64, 93-100.
Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way
Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908.
Mikel Artetxe, Sebastian Ruder, Dani Yogatama
On the cross-lingual transferability of monolingual representations (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre
A Call for More Rigor in Unsupervised Cross-lingual Learning (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Nora Aranberri
With or without you? Effects of using machine translation to write flash fiction in the foreign language (2020)
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, p. 165–174, Lisboa, Portugal, November 2020.
Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Pages 255-262
Uxoa Inurrieta, tziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola
Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)
Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767
Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation (2020)
Knowledge-Based Systems, Volume 206 (online first). Pre-print https://arxiv.org/abs/2002.12867
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Translation Artifacts in Cross-lingual Transfer Learning (2020)
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684).
Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation (2020)
Proceedings of the Fifth Conference on Machine Translation, pp: 873--878.
Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann and Lana Yeganova
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages (2020)
Fith Conference on Machine Translation (WMT20). Shared Task: Biomedical Translation Task
Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria
Adapting NMT to caption translation in Wikimedia Commons for low-resource languages (2019)
Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40
Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre
Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.
Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz
Neural Machine Translation of clinical texts between long distance languages (2019)
JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110
Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka
Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish (2019)
Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation
Mikel Artetxe, Holger Schwenk
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3197-3203.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
An Effective Approach to Unsupervised Machine Translation (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Bilingual Lexicon Induction through Unsupervised Machine Translation (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.
Mikel Artetxe, Holger Schwenk
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (2019)
Transactions of the Association for Computational Linguistics 7 (2019): 597-610.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text (2019)
Procesamiento del Lenguaje Natural 63 (2019): 151-154.
Ona de Gibert, Nora Aranberri
Estrategia multidimensional para la selección de candidatos de traducción automática para posedición (2019)
Linguamática, 11(2), 3-16.
Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe
Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora (2019)
Computational Linguistics. First online. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017.
Thierry Etchegoyhen, Eva Martínez, Andoni Azpeitia, Gorka Labaka, Iñaki Alegria, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Maite Martin eta Eusebi Calonge
Neural Machine Translation of Basque (2018)
EAMT 2018. Alicante.
Thierry Etchegoyhen, Eva Martı́nez, Andoni Azpeitia, Iñaki Alegria, Gorka Labaka, Arantxa Otegi, Kepa Sarasola, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Eusebi Calonge, Maite Martin
QUALES: Estimación Automática de Calidad de Traducción Mediante Aprendizaje Automático Supervisado y No-Supervisado (2018)
Procesamiento del Lenguaje Natural, vol. 61, pp. 143-146. ISSN: 1135-5948
Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola
Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)
Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak
Nora Aranberri, Gorka Labaka
Euskarazko Itzulpen Automatikoa - IXA Taldea (2017)
Senez, 48 (2017)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance (2016)
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289--2294. Austin, Texas. ISBN: 978-1-945626-25-8.
Aingeru Mayor, Iñaki Alegria, Arantza Díaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Kepa Sarasola
Matxin, an open-source rule-based machine translation system for Basque. (2011)
Machine Translation Journal: Volume 25, Issue 1 (2011), Page 53-82. ISSN: 0922-6567. DOI: 10.1007/s10590-011-9092-y. http://link.springer.com/content/pdf/10.1007%2Fs10590-011-9092-y.pdf
Gorka Labaka, Nicolas Stroppa, Andy Way, Kepa Sarasola
Comparing Rule-Based and Data-Driven Approaches to Spanish-to-Basque Machine Translation (2007)file2 (2007)
MT-Summit XI, Copenhagen ISBN: 978-87-90708-16-0; pp.297-304
Demos
Modela (2018)
Neural MT for Basque (with other partners, from Modela project)
NMT itzultzailea (2018)
Own NMT for Basque (from TADEEP project)
Matxin
Machine translation from Spanish to Basque
SignON - Sign Language Translation Mobile Application and Open Communications Framework
(2021 - 2023)
Trustworthy AI - Integrating Learning, Optimisation and Reasoning
(2020 - 2023)
European Language Equality
(2021 - 2022)
DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
(2019 - 2021)
red estratégica para la promoción de las infraestructuras de tecnologías del lenguaje en ehumanidades y ciencias sociales
(2020 - 2021)
Building Neuronal Mcahine Translation methods and systems to improve coherence at paragraph and document level
(2020 - 2021)- UnsupNMT: Traducción Automática Neuronal no Supervisada: un nuevo paradigma basado solo en textos monolingües.
UnsupNMT: Unsupervised Neuronal Machine Translation: a new paradigm based only on monolingual text
(2018 - 2020)
MODENA: Advanced neural modeling for high-quality translation.
(2018 - 2019)- MultiNMT: Traducción automática neuronal mulltidireccional orientada al cliente.
(2019 - 2019)
TADEEP: Deep Machine Translation
(2016 - 2018)
Ixa Group. 'A' level research group (Basque Government)
(2016 - 2018)
MODELA: Statistical Modeling and Deep Learning for High Quality Machine Translation
(2016 - 2017)
QTLeap: Quality Translation by Deep Language Engineering Approaches
(2013 - 2016)
Uxoa Iñurrieta
Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)
Procesamiento del Lenguaje Natural, 64, pp. 123-126.
Nora Aranberri
Can translationese features help users select an MT system for post-editing? (2020)
Revista Procesamiento del Lenguaje Natural, 64, 93-100.
Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way
Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908.
Mikel Artetxe, Sebastian Ruder, Dani Yogatama
On the cross-lingual transferability of monolingual representations (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre
A Call for More Rigor in Unsupervised Cross-lingual Learning (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Nora Aranberri
With or without you? Effects of using machine translation to write flash fiction in the foreign language (2020)
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, p. 165–174, Lisboa, Portugal, November 2020.
Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining (2020)
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Pages 255-262
Uxoa Inurrieta, tziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola
Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)
Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767
Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre
Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation (2020)
Knowledge-Based Systems, Volume 206 (online first). Pre-print https://arxiv.org/abs/2002.12867
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Translation Artifacts in Cross-lingual Transfer Learning (2020)
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684).
Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation (2020)
Proceedings of the Fifth Conference on Machine Translation, pp: 873--878.
Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann and Lana Yeganova
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages (2020)
Fith Conference on Machine Translation (WMT20). Shared Task: Biomedical Translation Task
Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria
Adapting NMT to caption translation in Wikimedia Commons for low-resource languages (2019)
Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40
Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre
Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.
Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz
Neural Machine Translation of clinical texts between long distance languages (2019)
JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110
Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka
Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish (2019)
Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation
Mikel Artetxe, Holger Schwenk
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3197-3203.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
An Effective Approach to Unsupervised Machine Translation (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Bilingual Lexicon Induction through Unsupervised Machine Translation (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.
Mikel Artetxe, Holger Schwenk
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (2019)
Transactions of the Association for Computational Linguistics 7 (2019): 597-610.
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text (2019)
Procesamiento del Lenguaje Natural 63 (2019): 151-154.
Ona de Gibert, Nora Aranberri
Estrategia multidimensional para la selección de candidatos de traducción automática para posedición (2019)
Linguamática, 11(2), 3-16.
Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe
Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora (2019)
Computational Linguistics. First online. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017.
Thierry Etchegoyhen, Eva Martínez, Andoni Azpeitia, Gorka Labaka, Iñaki Alegria, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Maite Martin eta Eusebi Calonge
Neural Machine Translation of Basque (2018)
EAMT 2018. Alicante.
Thierry Etchegoyhen, Eva Martı́nez, Andoni Azpeitia, Iñaki Alegria, Gorka Labaka, Arantxa Otegi, Kepa Sarasola, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Eusebi Calonge, Maite Martin
QUALES: Estimación Automática de Calidad de Traducción Mediante Aprendizaje Automático Supervisado y No-Supervisado (2018)
Procesamiento del Lenguaje Natural, vol. 61, pp. 143-146. ISSN: 1135-5948
Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola
Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)
Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak
Nora Aranberri, Gorka Labaka
Euskarazko Itzulpen Automatikoa - IXA Taldea (2017)
Senez, 48 (2017)
Mikel Artetxe, Gorka Labaka, Eneko Agirre
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance (2016)
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289--2294. Austin, Texas. ISBN: 978-1-945626-25-8.
Aingeru Mayor, Iñaki Alegria, Arantza Díaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Kepa Sarasola
Matxin, an open-source rule-based machine translation system for Basque. (2011)
Machine Translation Journal: Volume 25, Issue 1 (2011), Page 53-82. ISSN: 0922-6567. DOI: 10.1007/s10590-011-9092-y. http://link.springer.com/content/pdf/10.1007%2Fs10590-011-9092-y.pdf
Gorka Labaka, Nicolas Stroppa, Andy Way, Kepa Sarasola
Comparing Rule-Based and Data-Driven Approaches to Spanish-to-Basque Machine Translation (2007)file2 (2007)
MT-Summit XI, Copenhagen ISBN: 978-87-90708-16-0; pp.297-304