DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
(2019 - 2021)
Machine translation (MT) has been one of the most prominent applications of artificial intelligence since the very beginning of the field. In addition to its intrinsic interest given the difficulty and completeness of the problem, machine translation has a huge practical interest. Although in 2018 quality machine translation remains a challenge for most language pairs, the development of this field in recent years has been impressive. The combination of the neural machine translation (NMT) paradigm of Deep Learning and neural techniques has achieved results that seemed unthinkable three to four years ago.
On the other hand, companies and private users have become familiar with the advantages and limitations of using this technology. While companies focus on increasing productivity by combining translation memories, MT tools and post-editing environments, private users make use of it intensively despite the fact that, The demand for MT is increasing.
Based on the previous work of the research group, and the results of the TADEEP project (MINECO) and the participation in the MODELA project (Basque Government), we propose to investigate techniques that improve the state of the art of deep and neural learning MT systems while focusing on three very important aspects:
- Improvement of the quality of NMT translation and obtaining reliable evaluations. Currently NMT system display several shortcomings, especially with regard to the fidelity of the generated text, which must be studied and solved: untranslated segments, problems related to the use of terminology, named entities, quantities and adjectives.
- New contributions to unsupervised automatic translation (especially useful for languages with few resources). Among the results of the TADEEP project, we can to underline the high impact this line of research has obtained, with publications in the most important forums in the area (ACL, EMNLP, AAAI, ICLR). Further research in this line is one of the key objectives of this project, which will lead to high impact publications.
- MT adaptation to specific domains and transfer to the business environment, as well as the application of the NMT paradigm to other seq2seq problems (grammatical correction). This is the most applied part of the project, which tries to solve real needs of nearby businesses and social contexts.
The IXA group of the UPV/EHU has the know-how and experience necessary to face this project: in addition to the experts in different aspects of MT, we have experts in morphology, syntax, semantics and machine learning. Building on the know-how of the IXA research group, the collaboration with the Elhuyar Foundation adds a number of important features by
providing resources, proximity to the marketa and expertise on evaluation. Additionally, the specific participation of the University of Santiago allows us to widen the scope of the research lines of unsupervised learning and the linguistic motivation of the results.
Regarding the adequacy of MT within the R+D+i sphere, it should be stressed that this project is directly related to challenge 7, Digital Economy, Society and Culture, (section VI, Advanced Technologies for Natural Language Processing) of the State Plan For Scientific And Technical Research And Innovation 2017-2010. This interest is part of the National Plan for the Promotion of Language Technologies, one of the pillars of the Digital Agenda.
Organization: Ministerio de Ciencia, Innovación y Universidades.
Main researcher: Kepa Sarasola, Eneko Agirre
Eneko Agirre, Iñaki Alegria, Nora Aranberri, Mikel Artetxe, Kike Fernandez, Uxoa Iñurrieta, Gorka Labaka, Mikel Lersundi, Maite Oronoz, Olatz Perez de Viñaspre, Kepa Sarasola, Xabier Soto, Ruben Urizar