Text Analysis

Natural Language Analysis Tools are software modules that perform linguistic analysis on texts at different levels. These tools are essential components of any Natual Language Processing (NLP) software that analyzes text, and any text mining software is typically built by combining basic linguistic modules forming complex pipelines.

The HiTZ center has a large tradition in building analysis tools for many languages, which range from basic linguistic processors such as tokenizers, Part-...Read More

see more

Text_analysis_tabs

Demos

Demo of the English NLP pipeline

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the Spanish NLP pipeline

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format.

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Xuxen

Basque spelling corrector on-line

Contracts

Projects

Patents

MALTIXA

Resources

Publications

Y Yaghoobzadeh, K Kann, TJ Hazen, E Agirre, H Schütze

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings (2019)

Proceedings of ACL.

Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation (2018)

Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 282–291. Brussels, Belgium, October 31 - November 1, 2018. Best paper award

Zuhaitz Beloki and Xabier Artola and Aitor Soroa

A scalable architecture for data-intensive natural language processing (2017)

Natural Language Engineering, 1-23. doi:10.1017/S1351324917000092.

Rodrigo Agerri, Xabier Artola, Zuhaitz Beloki, German Rigau, Aitor Soroa

Big data for Natural Language Processing: A streaming approach (2015)

Knowledge-Based Systems. http://dx.doi.org/10.1016/j.knosys.2014.11.007. Vol.79, pages 36-42.

Xabier Artola, Zuhaitz Beloki, Aitor Soroa

A stream computing approach towards scalable NLP (2014)

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland. ISBN: 978-2-9517408-8-4

Rodrigo Agerri, Josu Bermudez, German Rigau

IXA pipeline: Efficient and Ready to Use Multilingual NLP tools. (2014)

LREC 2014: 3823-3828. ISBN 978-2-9517408-8-4

More publications

Text_analysis_tabs_full

Demo of the English NLP pipeline

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the Spanish NLP pipeline

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format.

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Xuxen

Basque spelling corrector on-line

MALTIXA

Y Yaghoobzadeh, K Kann, TJ Hazen, E Agirre, H Schütze

Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings (2019)

Proceedings of ACL.

Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, Eneko Agirre

Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation (2018)

Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 282–291. Brussels, Belgium, October 31 - November 1, 2018. Best paper award

Zuhaitz Beloki and Xabier Artola and Aitor Soroa

A scalable architecture for data-intensive natural language processing (2017)

Natural Language Engineering, 1-23. doi:10.1017/S1351324917000092.

Rodrigo Agerri, Xabier Artola, Zuhaitz Beloki, German Rigau, Aitor Soroa

Big data for Natural Language Processing: A streaming approach (2015)

Knowledge-Based Systems. http://dx.doi.org/10.1016/j.knosys.2014.11.007. Vol.79, pages 36-42.

Xabier Artola, Zuhaitz Beloki, Aitor Soroa

A stream computing approach towards scalable NLP (2014)

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland. ISBN: 978-2-9517408-8-4

Rodrigo Agerri, Josu Bermudez, German Rigau

IXA pipeline: Efficient and Ready to Use Multilingual NLP tools. (2014)

LREC 2014: 3823-3828. ISBN 978-2-9517408-8-4

More publications