Webinars series

2024-2025


Registration
Mirella Lapata (The University of Edinburgh)
TBA (Thursday, June 5, 2025 - 15:00 CET)
Summary:

.


Bio:

.



Registration
André F. T. Martins ()
TBA (Thursday, May 8, 2025 - 15:00 CET)
Summary:

.


Bio:

.



Registration
Emanuele Bugliarello (Google Deep Mind)
TBA (Thursday, April 3, 2025 - 15:00 CET)
Summary:

.


Bio:

.



Registration
Christian Herff (Maastricht University)
TBA (Saturday, March 8, 2025 - 15:00 CET)
Summary:

.


Bio:

.



Registration
Sebastian Ruder (Cohere)
(Thursday, February 6, 2025 - 15:00 CET)
Summary:

.


Bio:

.



Registration
Ekaterina Shutova (University of Amsterdam)
Cross-lingual information sharing in multilingual language models (Thursday, January 30, 2025 - 15:00 CET)
Summary:

Multilingual language models (MLMs), such as XLM-R or BLOOM, are pretrained on data covering many languages and share their parameters across all languages. This modeling approach has several powerful advantages, such as allowing similar languages to exert positive influence on each other, and enabling cross-lingual task transfer (i.e., fine-tuning on some source language(s), then using the model on different target languages). The success of such transfer, however, depends on the model's ability to effectively share information between different languages in its parameter space. Yet, the cross-lingual information sharing mechanisms within MLMs are still not fully understood. In this talk, I will present our recent research that investigates this question from three different perspectives: encoding of typological relationships between languages within MLMs, language-wise modularity of MLMs and the influence of training examples in specific languages on predictions made in others.


Bio:

Ekaterina Shutova is an Associate Professor at the ILLC, University of Amsterdam, where she leads the Amsterdam Natural Language Understanding Lab and the Natural Language Processing & Digital Humanities research unit. She received her PhD from the University of Cambridge, and then worked as a research scientist at the University of California, Berkeley. Ekaterina’s current research focuses on few-shot learning for language interpretation tasks, multilingual NLP, generalisability and robustness of NLP models and interpretability in deep learning. Her prominent service roles include Program Chair of ACL 2025, Senior Action Editor of ACL Rolling Review, Action Editor of Computational Linguistics and Demonstrations chair at EMNLP 2022. She is also an ELLIS scholar.



Registration
Javier de la Rosa (Artificial Intelligence Lab (National Library of Norway))
TBA (Thursday, December 12, 2024 - 15:00 CET)
Summary:

.


Bio:

.



Registration
Elena Sokolova (Amazon Text-to-Speech Group)
No recording available for this webinar How we do research in Speech at Amazon (Thursday, November 7, 2024 - 15:00 CET)
Summary:

In this talk we will present how Speech technology has developed in the past 20 years. We will take a dive deep into the research that we do at Amazon in our Text to Speech lab, describe the challenges that we face and how we solve them at scale. We will also give an overview of the internship opportunities we have in our department for those of you who want to join our team in 2025.


Bio:

Elena is a Machine Learning team manager at Amazon, where she leads novel research in the field of speech technology. Over the past five years, she has overseen the deployment of machine learning projects into production and collaborated with her team to publish cutting-edge research on text-to-speech technology. Before joining Amazon, Elena completed her PhD at Radboud University Nijmegen in the Netherlands and gained industry experience as a Senior Machine Learning Scientist at Booking.com.


2023-2024


Registration
Marco Baroni (Universitat Pompeu Fabra)
Unnatural Language Processing: On the Puzzling Out-of-Distribution Behavior of Language Models (Thursday, June 6, 2024 - 15:00 CET)
Summary:

Modern language models (LMs) respond with uncanny fluency when prompted using a natural language, such as English. However, they can also produce predictable, semantically meaningful output when prompted with low-likelihood "gibberish" strings, a phenomenon exploited for developing effective information extraction prompts (Shin et al. 2020) and bypassing security checks in adversarial attacks (Zou et al. 2023). Moreover, the same "unnatural" prompts often trigger the same behavior across LMs (Rakotonirina et al. 2023, Zou et al. 2023), hinting at a shared "universal" but unnatural LM code. In my talk, I will use unnatural prompts as a tool to gain insights into how LMs process language-like input. I will in particular discuss recent and ongoing work on three fronts: transferable unnatural prompts, as a window into LM invariances (Rakotonirina et al. 2023); mechanistic interpretability exploration of the activation pathways triggered by natural and unnatural prompts (Kervadec et al. 2023); and first insights into the lexical nature of unnatural prompts. Although a comprehensive understanding of how and why LMs respond to unnatural language remains elusive, I aim to present a set of intriguing facts that I hope will inspire others to explore this phenomenon.

 


Bio:

Marco Baroni received a PhD in Linguistics from the University of California, Los Angeles. After various experiences in research and industry, in 2019 he became an ICREA research professor, affiliated with the Linguistics Department of Pompeu Fabra University in Barcelona. Marco's work in the areas of multimodal and compositional distributed semantics has received widespread recognition, including a Google Research Award, an ERC Grant, the ICAI-JAIR best paper prize and the ACL test-of-time award. Marco was recently awarded another ERC grant to conduct research on improving communication between artificial neural networks, taking inspiration from human language and other animal communication systems.



Registration
Smaranda Muresan (Columbia University)
Human-centric NLP: From Argumentation to Creativity (Thursday, March 7, 2024 - 15:00 CET)
Summary:

Abstract: Large language models (LLMs) constitute a paradigm shift in Natural Language Processing (NLP) and its applications across all domains. Models such as ChatGPT seem to possess human-like abilities --- reasoning about problems, passing bar exams, writing stories. But do they? In trying to answer this question, I will discuss three main desiderata for building human-centric NLP systems: knowledge-aware models, human-AI collaboration frameworks, and theoretically-grounded evaluation protocols. In this talk, I will use argumentation and creativity as two case studies. I will cover knowledge-aware models for implicit premise generation, human-AI collaboration framework for high-quality datasets creation (e.g., visual metaphors) and helping human solve tasks (e.g., writing short stories), and last but not least a novel evaluation protocol for assessing the creative capabilities of LLMs in both producing as well as assessing creative text.


Bio:

Smaranda Muresan is a Research Scientist at the Data Science Institute at Columbia University, a Visiting Associate Professor at Barnard College and an Amazon Scholar. Her research focuses on human-centric Natural Language Processing for social good and responsible computing. She develops theory-guided and knowledge-aware computational models for understanding and generating language in context (e.g., visual, social, multilingual, multicultural) with applications to computational social science, education, and public health. Research topics that she worked on over the years include: argument mining and generation, fact-checking and misinformation detection, figurative language understanding and generation (e.g., sarcasm, metaphor, idioms), and multilingual language processing for low-resource and endangered languages. Recently, her research interests include explainable models and human-AI collaboration frameworks for high-quality datasets creation. She received best papers awards at SIGDIAL 2017 and ACL 2018 (short paper). She served as a board member for the North American Chapter of the Association for Computational Linguistics



Registration
Heng Ji (University of Illinois)
SmartBook: an AI Prophetess for Disaster Reporting and Forecasting (Friday, February 16, 2024 - 15:00 CET)
Summary:

History repeats itself, sometimes in a bad way. If we don’t learn lessons from history, we might suffer similar tragedies, which are often preventable. For example, many experts now agree that some schools were closed for too long during COVID-19 and that abruptly removing millions of children from American classrooms has had harmful effects on their emotional and intellectual health. Also many wish we had invested in vaccines earlier, prepared more personal protective equipment and medical facilities, provided online consultation services for people who suffered from anxiety and depression, and created better online education platforms for students. Similarly, genocides throughout history (from those in World War II to the recent one in Rwanda in 1994) have also all shared early warning signs (e.g., organization of hate groups, militias, and armies and polarization of the population) forming patterns that follow discernible progressions. Preventing natural or man-made disasters requires being aware of these patterns and taking pre-emptive action to address and reduce them, or ideally, eliminate them. Emerging events, such as the COVID pandemic and the Ukraine Crisis, require a time-sensitive comprehensive understanding of the situation to allow for appropriate decision-making and effective action response. Automated generation of situation reports can significantly reduce the time, effort, and cost for domain experts when preparing their official human-curated reports. However, AI research toward this goal has been very limited, and no successful trials have yet been conducted to automate such report generation and “what-if” disaster forecasting. Pre-existing natural language processing and information retrieval techniques are insufficient to identify, locate, and summarize important information, and lack detailed, structured, and strategic awareness. We propose SmartBook, a novel framework that cannot be solved by ChatGPT, targeting situation report generation which consumes large volumes of news data to produce a structured situation report with multiple hypotheses (claims) summarized and grounded with rich links to factual evidence by claim detection, fact checking, misinformation detection and factual error correction. Furthermore, SmartBook can also serve as a novel news event simulator, or an intelligent prophetess. Given “What-if” conditions and dimensions elicited from a domain expert user concerning a disaster scenario, SmartBook will induce schemas from historical events, and automatically generate a complex event graph along with a timeline of news articles that describe new simulated events based on a new Λ-shaped attention mask that can generate text with infinite length. By effectively simulating disaster scenarios in both event graph and natural language format, we expect SmartBook will greatly assist humanitarian workers and policymakers to exercise reality checks (what would the next disaster look like under these given conditions?), and thus better prevent and respond to future disasters.  


Bio:

Heng Ji is a professor at Computer Science Department, and an affiliated faculty member at Electrical and Computer Engineering Department and Coordinated Science Laboratory of University of Illinois Urbana-Champaign. She is an Amazon Scholar. She is the Founding Director of Amazon-Illinois Center on AI for Interactive Conversational Experiences (AICE). She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge-enhanced Large Language Models, Knowledge-driven Generation and Conversational AI. She was selected as a Young Scientist to attend the 6th World Laureates Association Forum, and selected to participate in DARPA AI Forward in 2023. She was selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. She was named as part of Women Leaders of Conversational AI (Class of 2023) by Project Voice. The awards she received include "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, PACLIC2012 Best paper runner-up, "Best of ICDM2013" paper award, "Best of SDM2013" paper award, ACL2018 Best Demo paper nomination, ACL2020 Best Demo Paper Award, NAACL2021 Best Demo Paper Award, Google Research Award in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014 and Bosch Research Award in 2014-2018. She was invited by the Secretary of the U.S. Air Force and AFRL to join Air Force Data Analytics Expert Panel to inform the Air Force Strategy 2030, and invited to speak at the Federal Information Integrity R&D Interagency Working Group (IIRD IWG) briefing in 2023. She is the lead of many multi-institution projects and tasks, including the U.S. ARL projects on information fusion and knowledge networks construction, DARPA ECOLE MIRACLE team, DARPA KAIROS RESIN team and DARPA DEFT Tinker Bell team. She has coordinated the NIST TAC Knowledge Base Population task since 2010-2021. She was the associate editor for IEEE/ACM Transaction on Audio, Speech, and Language Processing, and served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018 and AACL-IJCNLP2022. She is elected as the North American Chapter of the Association for Computational Linguistics (NAACL) secretary 2020-2023. Her research has been widely supported by the U.S. government agencies (DARPA, NSF, DoE, ARL, IARPA, AFRL, DHS) and industry (Amazon, Google, Facebook, Bosch, IBM, Disney). 

 



Registration
Emily M. Bender (University of Washington)
Meaning making with artificial interlocutors and risks of language technology (Thursday, November 2, 2023 - 16:00 CET)
Summary:

Humans make sense of language in context, bringing to bear their own understanding of the world including their model of their interlocutor's understanding of the world. In this talk, I will explore various potential risks that arise when we as humans bring this sense-making capacity to interactions with artificial interlocutors. That is, I will ask what happens in conversations where one party has no (or extremely limited) access to meaning and all of the interpretative work rests with the other, and briefly explore what this entails for the design of language technology.


Bio:

Emily M. Bender is a Professor of Linguistics and an Adjunct Professor in the School of Computer Science and the Information School at the University of Washington, where she has been on the faculty since 2003. Her research interests include multilingual grammar engineering, computational semantics, and the societal impacts of language technology. In 2022 she was elected as a Fellow of the American Association for the Advancement of Science (AAAS).


2022-202


Registration
eooo ()
(Wednesday, June 14, 2023 - 09:00 CET)
Summary:

.


Bio:

.


2022-2023

Pascale Fung (The Hong Kong University of Science and Technology)
Safer Generative ConvAI (Thursday, June 1, 2023 - 15:00 CET)
Summary:

Generative models for Conversational AI are less than a decade old,  but they hold great promise for human-machine interactions. Machine responses based on generative models can seem quite fluent and human-like, empathetic and funny, knowledgeable and professional. However, behind the confident voice of generative ConvAI systems, they can also be hallucinating misinformation, giving biased and harmful views, and are still not "safe" enough for many real life applications. The expressive power of generative ConvAI models and their undesirable behavior are two sides of the same coin. How can we harness the fluency, diversity, engagingness of generative ConvAI models while mitigating the downside? In this talk, I will present some of our team’s recent work in making generative ConvAI safer via mitigating hallucinations, misinformation, and toxicity.


Bio:

Pascale Fung is a Chair Professor at the Department of Electronic & Computer Engineering at The Hong Kong University of Science & Technology (HKUST), and a visiting professor at the Central Academy of Fine Arts in Beijing. She is an elected Fellow of the Association for the Advancement of Artificial Intelligence (AAAI) for her "significant contributions to the field of conversational AI and to the development of ethical AI principles and algorithms", an elected Fellow of the Association for Computational Linguistics (ACL) for her “significant contributions towards statistical NLP, comparable corpora, and building intelligent systems that can understand and empathize with humans”. She is a Fellow of the Institute of Electrical and Electronic Engineers (IEEE) for her “contributions to human-machine interactions” and an elected Fellow of the International Speech Communication Association for “fundamental contributions to the interdisciplinary area of spoken language human-machine interactions”. She is the Director of HKUST Centre for AI Research (CAiRE). She was the founding chair of the Women Faculty Association at HKUST. She is an expert on the Global Future Council, a think tank for the World Economic Forum. She represents HKUST on Partnership on AI to Benefit People and Society. She is on the Board of Governors of the IEEE Signal Processing Society. She is a member of the IEEE Working Group to develop an IEEE standard - Recommended Practice for Organizational Governance of Artificial Intelligence. Her research team has won several best and outstanding paper awards at ACL, ACL and NeurIPS workshops.


Martin Cooke (Ikerbasque – Basque Foundation for Science)
Who needs big data? Listeners' adaptation to extreme forms of variability in speech (Thursday, May 4, 2023 - 15:00 CET)
Summary:

No theory of speech perception can be considered complete without an explanation of how listeners are able to extract meaning from severely degraded forms of speech. Starting with a brief overview of a century of research which has seen the development of many types of distorted speech, followed by some anecdotal evidence that automatic speech recognisers still have some way to go to match listeners' performance in this area, I will describe the outcome of one recent [1] and several ongoing studies into the detailed time course of a listener's response to distorted speech. These studies variously consider the rapidity of adaptation, whether adaptation can only proceed if words are recognised, the degree to which the response to one form of distortion is conditioned on prior experience with other forms, and the nature of adaptation in a language other than one's own native tongue. Taken together, findings from these experiments suggest that listeners are capable of continuous and extremely rapid adaptation to novel forms of speech that differ greatly from the type of input that makes up the vast bulk of their listening experience. It is an open question as to whether big-data-based automatic speech recognition can offer a similar degree of flexibility. [1] Cooke, M, Scharenborg, O and Meyer, B (2022). The time course of adaptation to distorted speech. J. Acoust. Soc. Am. 151, 2636-2646. 10.1121/10.0010235


Bio:

Martin Cooke is Ikerbasque Research Professor. After starting his career in the UK National Physical Laboratory, he worked at the University of Sheffield for 26 years before taking up his current position. His research has focused on analysing the computational auditory scene, devising algorithms for robust automatic speech recognition and investigating human speech perception. His interests also include the effects of noise on talkers as well as listeners, and second language listening in noise.


Isabelle Augenstein (University of Copenhagen)
Beyond Fact Checking — Modelling Information Change in Scientific Communication (Thursday, March 2, 2023 - 15:00 CET)
Summary:

Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. In this talk, I will present some first steps towards addressing these problems, discussing our research on exaggeration detection, scientific fact checking, and on modelling information change in scientific communication more broadly.


Bio:

Isabelle Augenstein is a Professor at the University of Copenhagen, Department of Computer Science, where she heads the Copenhagen Natural Language Understanding research group as well as the Natural Language Processing section. Her main research interests are fact checking, low-resource learning, and explainability. Prior to starting a faculty position, she was a postdoctoral researcher at University College London, and before that a PhD student at the University of Sheffield. In October 2022, Isabelle Augenstein became Denmark’s youngest ever female full professor. She currently holds a prestigious ERC Starting Grant on 'Explainable and Robust Automatic Fact Checking', as well as the Danish equivalent of that, a DFF Sapere Aude Research Leader fellowship on 'Learning to Explain Attitudes on Social Media’. She is a member of the Young Royal Danish Academy of Sciences and Letters, and Vice President-Elect of SIGDAT, which organises the EMNLP conference series.


Thomas Hueber (CNRS/GIPSA-lab)
Computational model of speech learning, a focus on the acoustic-articulatory mapping (Thursday, February 2, 2023 - 15:00 CET)
Summary:

Speech production is a complex motor process involving several physiological phenomena, such as the neural, nervous and muscular activities that drive our respiratory, laryngeal and articulatory movements. Modeling speech production, in particular the relationship between articulatory gestures (tongue, lips, jaw, velum) and acoustic realizations of speech, is a challenging, and still evolving, research question. From an applicative point of view, such models could be embedded into assistive devices able to restore oral communication when part of the speech production chain is damaged (articulatory synthesis, silent speech interface). They could also help rehabilitate speech sound disorders using a therapy based on biofeedback (and articulatory inversion). From a more fundamental research perspective, such models can also be used to question the cognitive mechanisms underlying speech learning, perception and motor control. In this talk, I will present three recent studies conducted in our group to address some of these fundamental questions. In the first one, we quantified the benefit of relying on lip movement when learning speech representations in a self-supervised manner using predictive coding techniques. In the second one, we integrated articulatory priors into the latent space of a variational auto-encoder, with potential application to speech enhancement. In the third one, I will describe a first attempt toward a computational model of speech learning, based on deep learning, which can be used to understand how a child learns the acoustic-to-articulatory inverse mapping in a self-supervised manner.


Bio:

Thomas Hueber is a senior research scientist at CNRS (« Directeur de recherche ») working at GIPSA-lab in Grenoble, France. He is head of the CRISSP research team (cognitive robotics, interactive systems and speech processing). He holds a Ph.D. in Computer Science from Pierre and Marie Curie University (Paris) in 2009. His research activities focus on automatic speech processing, with a particular interest in (1) the capture, analysis and modeling of articulatory gestures and electrophysiological signals involved in its production, (2) the development of speech technologies that exploit these different signals, for speech recognition and synthesis, for people with a spoken communication disorder, and (3) the study, through modeling and simulation, of the cognitive mechanisms underlying speech perception and production. He received in 2011 the 6th Christian Benoit award (ISCA/AFCP/ACB) and in 2015 the ISCA Award for the best paper published in Speech Communication. In 2017, he co-edited in IEEE/ACM Trans. Audio Speech and Language Processing, a special issue on Biosignal-based speech processing. He is also associate editor of EURASIP Journal on Audio, Speech, and Music Processing.


Maarit Koponen (University of Eastern Finland)
Machine translation as a tool for multilingual information: different users and use scenarios (Thursday, December 1, 2022 - 15:00 CET)
Summary:

Recent advances in machine translation quality have improved its usefulness as a tool to satisfy the demand for multilingual information and communication. Machine translation is nowadays a common part of professional translation workflows, but it is not a tool exclusive to translators. Users of machine translation can be found, for example, in public service institutions and newsrooms looking to produce and disseminate information in multiple languages. At the same time, machine translation can also offer a way for people to access information that may not otherwise be available in their language. Effective and responsible use of machine translation, however, requires a clear understanding of the potential risks as well as potential benefits. In this talk, I discuss how machine translation is used for producing and accessing information and how various situational factors affect its use in different scenarios.


Bio:

Dr Maarit Koponen currently works as Professor of Translation Studies at the University of Eastern Finland. She has previously worked as a post-doctoral researcher at the University of Helsinki and as a lecturer at the University of Turku after receiving her PhD in Language Technology at the University of Helsinki in 2016. Her research focuses on translation technology, particularly machine translation, and the effect of technology on translation both in professional and non-professional settings. Starting in October 2022, Koponen leads a work package focusing on linguistic barriers to information accessibility and technological solutions as part of the research project DECA (Democratic epistemic capacities in the age of algorithms), funded by the Academy of Finland Strategic Research Council. She chairs Working Group 7 “Language work, language professionals” of the EU COST Action “Language in the Human-Machine Era” (LITHME). She has also worked as a professional translator for several years.


Vered Shwartz (The University of British Columbia-Vancouver)
Incorporating Commonsense Reasoning into NLP Models (Thursday, November 3, 2022 - 15:30 CET)
Summary:

NLP models are primarily supervised, and are by design trained on a sample of the situations they may encounter in practice. The ability of models to generalize to and address unknown situations reasonably is limited, but may be improved by endowing models with commonsense knowledge and reasoning skills. In this talk, I will present several lines of work in which commonsense is used for improving the performance of NLP tasks: for completing missing knowledge in underspecified language, interpreting figurative language, and resolving context-sensitive event coreference. Finally, I will discuss open problems and future directions in building NLP models with commonsense reasoning abilities.


Bio:

Vered Shwartz is an Assistant Professor of Computer Science at the University of British Columbia and a faculty member at the Vector Institute for Artificial Intelligence. Her research interests include commonsense reasoning, computational semantics and pragmatics, and multiword expressions. Previously, Vered was a postdoctoral researcher at the Allen Institute for AI (AI2) and the University of Washington, and received her PhD in Computer Science from Bar-Ilan University.


Xiang Ren (University of Southern California - USC)
Commonsense Reasoning in the Wild (Thursday, October 6, 2022 - 17:00 CET)
Summary:

Current NLP systems impress us by achieving close-to-human performance on benchmarks of answering commonsense questions or writing interesting stories. However, most of the progress is evaluated using static, closed-ended datasets created for individual tasks. To deploy commonsense reasoning services in the wild, we look to develop and evaluate systems that can generate answers in an open-ended way, perform robust logical reasoning, and generalize across diverse task formats, domains, and datasets. In this talk I will share our effort on introducing new formulations of commonsense reasoning challenges and novel evaluation protocols, towards broadening the scope in approaching machine common sense. We hope that such a shift of evaluation paradigm would encourage more research on externalizing the model reasoning process and improving model robustness and cross-task generalization.


Bio:

Xiang Ren is an assistant professor and Viterbi Early Career Chair at the USC Computer Science Department, a Research Team Leader at USC ISI, and the director of the Intelligence and Knowledge Discovery (INK) Lab at USC. Priorly, he spent time as a research scholar at Stanford University and received his Ph.D. in Computer Science from the University of Illinois Urbana-Champaign. Ren's research seeks to build generalizable natural language processing (NLP) systems which can handle a wide variety of language tasks and situations. He works on new algorithms and datasets to make NLP systems cheaper to develop and maintain, arm machine models with common sense, and improve models’ transparency and reliability to build user trust. His research work has received several best paper awards in top NLP and AI conference venues. Ren has been awarded a NSF CAREER Award, multiple faculty research awards from Google, Facebook, Amazon, JP Morgan and Sony, and the 2018 ACM SIGKDD Doctoral Dissertation Award. He was named Forbes' Asia 30 Under 30 in 2019.

 
 

2021-2022

Mikel Artetxe (FAIR (Meta AI))
Is scale all you need? (Friday, June 24, 2022 - 10:00 CET)
Summary:

Every once in a while, a new language model with gazillion parameters makes a big splash in Twitter, smashing the previous SOTA in some benchmarks or showing some impressive emerging capabilities. While some may argue that scaling will eventually solve NLP, others are skeptical about the scientific value of this trend. In this talk, I will argue that scaling is not just engineering, but also comes with exciting research questions. I will present some of our recent work in the topic, and discuss our efforts to make large language models more accessible for the community.


Bio:

Mikel Artetxe is a Research Scientist at FAIR (Meta AI). His primary area of research is multilingual NLP. Mikel was one the pioneers of unsupervised machine translation, and has done extensive work on cross-lingual representation learning. More recently, he has also been working on natural language generation, few-shot learning, and large-scale language models. Prior to joining FAIR, Mikel did his PhD at the IXA group at the University of the Basque Country, and interned at DeepMind, FAIR and Google.


Sakriani Sakti (Japan Advanced Institute of Science and Technology)
Semi-supervised Learning for Low-resource Multilingual and Multimodal Speech Processing with Machine Speech Chain (Thursday, May 5, 2022 - 15:00 CET)
Summary:

The development of advanced spoken language technologies based on automatic speech recognition (ASR) and text-to-speech synthesis (TTS) has enabled computers to either learn how to listen or speak. Many applications and services are now available but still support fewer than 100 languages. Nearly 7000 living languages that are spoken by 350 million people remain uncovered. This is because the construction is commonly done based on machine learning trained in a supervised fashion where a large amount of paired speech and corresponding transcription is required. In this talk, we will introduce a semi-supervised learning mechanism based on a machine speech chain framework. First, we describe the primary machine speech chain architecture that learns not only to listen or speak but also to listen while speaking. The framework enables ASR and TTS to teach each other given unpaired data. After that, we describe the use of machine speech chain for code-switching and cross-lingual ASR and TTS of several languages, including low-resourced ethnic languages. Finally, we describe the recent multimodal machine chain that mimics overall human communication to listen while speaking and visualizing. With the support of image captioning and production models, the framework enables ASR and TTS to improve their performance using an image-only dataset.


Bio:

Sakriani Sakti is currently an associate professor at Japan Advanced Institute of Science and Technology (JAIST) Japan, adjunct associate professor at Nara Institute of Science and Technology (NAIST) Japan, visiting research scientist at RIKEN Center for Advanced Intelligent Project (RIKEN AIP) Japan, and adjunct professor at the University of Indonesia. She received DAAD-Siemens Program Asia 21st Century Award in 2000 to study in Communication Technology, University of Ulm, Germany, and received her MSc degree in 2002. During her thesis work, she worked with the Speech Understanding Department, DaimlerChrysler Research Center, Ulm, Germany. She then worked as a researcher at ATR Spoken Language Communication (SLC) Laboratories Japan in 2003-2009, and NICT SLC Groups Japan in 2006-2011, which established multilingual speech recognition for speech-to-speech translation. While working with ATR and NICT, Japan, she continued her study (2005-2008) with Dialog Systems Group University of Ulm, Germany, and received her Ph.D. degree in 2008. She was actively involved in international collaboration activities such as Asian Pacific Telecommunity Project (2003-2007) and various speech-to-speech translation research projects, including A-STAR and U-STAR (2006-2011). In 2011-2017, she was an assistant professor at the Augmented Human Communication Laboratory, NAIST, Japan. She also served as a visiting scientific researcher of INRIA Paris-Rocquencourt, France, in 2015-2016, under JSPS Strategic Young Researcher Overseas Visits Program for Accelerating Brain Circulation. In 2018–2021, she was a research associate professor at NAIST and a research scientist at RIKEN, Center for Advanced Intelligent Project AIP, Japan. Currently, she is an associate professor at JAIST, adjunct associate professor at NAIST, visiting research scientist at RIKEN AIP, and adjunct professor at the University of Indonesia. She is a member of JNS, SFN, ASJ, ISCA, IEICE, and IEEE. Furthermore, she is currently a committee member of IEEE SLTC (2021-2023) and an associate editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing (2020-2023). She was a board member of Spoken Language Technologies for Under-resourced languages (SLTU) and the general chair of SLTU2016. She was also the general chair of the "Digital Revolution for Under-resourced Languages (DigRevURL)" Workshop as the Interspeech Special Session in 2017 and DigRevURL Asia in 2019. She was also the organizing committee of the Zero Resource Speech Challenge 2019 and 2020. She was also involved in creating joint ELRA and ISCA Special Interest Group on Under-resourced Languages (SIGUL) and served as SIGUL Board since 2018. Last year, in collaboration with UNESCO and ELRA, she was also the organizing committee of the International Conference of "Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide". Her research interests lie in deep learning & graphical model framework, statistical pattern recognition, zero-resourced speech technology, multilingual speech recognition and synthesis, spoken language translation, social-affective dialog system, and cognitive-communication.


Dan Roth (University of Pennsylvania)
It’s Time to Reason (Thursday, April 7, 2022 - 15:00 CET)
Summary:

The fundamental issue underlying natural language understanding is that of semantics – there is a need to move toward understanding natural language at an appropriate level of abstraction in order to support natural language understanding and communication with computers. Machine Learning has become ubiquitous in our attempt to induce semantic representations of natural language and support decisions that depend on it; however, while we have made significant progress over the last few years, it has focused on classification tasks for which we have large amounts of annotated data. Supporting high level decisions that depend on natural language understanding is still beyond our capabilities, partly since most of these tasks are very sparse and generating supervision signals for it does not scale. I will discuss some of the challenges underlying reasoning – making natural language understanding decisions that depend on multiple, interdependent, models, and exemplify it mostly using the domain of Reasoning about Time, as it is expressed in natural language.


Bio:

Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, lead of NLP Science at Amazon AWS AI, and a Fellow of the AAAS, the ACM, AAAI, and the ACL. In 2017, Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.” Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely. Roth was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR) and a program chair of AAAI, ACL, and CoNLL. Roth has been involved in several startups; most recently he was a co-founder and chief scientist of NexLP, a startup that leverages the latest advances in Natural Language Processing (NLP), Cognitive Analytics, and Machine Learning in the legal and compliance domains. NexLP was acquired by Reveal in 2020. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.


Desmond Elliott (University of Copenhagen)
Visually Grounded Reasoning across Languages and Cultures (Thursday, March 3, 2022 - 15:00 CET)
Summary:

The design of widespread vision-and-language datasets and pre-trained encoders directly adopts, or draws inspiration from, the concepts and images of ImageNet. While one can hardly overestimate how much this benchmark contributed to progress in computer vision, it is mostly derived from lexical databases and image queries in English, resulting in source material with a North American or Western European bias. Therefore, we devise a new protocol to construct an ImageNet-style hierarchy representative of more languages and cultures. In particular, we let the selection of both concepts and images be entirely driven by native speakers, rather than scraping them automatically. Specifically, we focus on a typologically diverse set of languages, namely, Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish. On top of the concepts and images obtained through this new protocol, we create a multilingual dataset for Multicultural Reasoning over Vision and Language (MaRVL) by eliciting statements from native speaker annotators about pairs of images. The task consists of discriminating whether each grounded statement is true or false. We establish a series of baselines using state-of-the-art models and find that their cross-lingual transfer performance lags dramatically behind supervised performance in English. These results invite us to reassess the robustness and accuracy of current state-of-the-art models beyond a narrow domain, but also open up new exciting challenges for the development of truly multilingual and multicultural systems.


Bio:

Desmond is an Assistant Professor at the University of Copenhagen. His primary research interests are multimodal and multilingual machine learning and he was involved in the creation of the Multi30K, How2, and MaRVL datasets. His work received an Area Chair Favourite paper at COLING 2018 and the Best Long Paper Award at EMNLP 2021. He co-organised the Multimodal Machine Translation Shared Task from 2016–2018, the 2018 Frederick Jelinek Memorial Workshop on Grounded Sequence-to-Sequence Learning, the How2 Challenge Workshop at ICML 2019, and the Workshop on Multilingual Multimodal Learning at ACL 2022.


Roger Moore (The University of Sheffield)
Talking with Robots: Are We Nearly There Yet? (Thursday, February 3, 2022 - 15:00 CET)
Summary:

Recent years have seen considerable progress in the deployment of 'intelligent' communicative agents such as Apple's Siri and Amazon’s Alexa. However, effective speech-based human-robot dialogue is less well developed; not only do the fields of robotics and spoken language technology present their own special problems, but their combination raises an additional set of issues. In particular, there appears to be a large gap between the formulaic behaviour that typifies contemporary spoken language dialogue systems and the rich and flexible nature of human-human conversation. As a consequence, we still seem to be some distance away from creating Autonomous Social Agents such as robots that are truly capable of conversing effectively with their human counterparts in real world situations. This talk will address these issues and will argue that we need to go far beyond our current capabilities and understanding if we are to move from developing robots that simply talk and listen to evolving intelligent communicative machines that are capable of entering into effective cooperative relationships with human beings.


Bio:

Prof. Moore has over 40 years’ experience in Speech Technology R&D and, although an engineer by training, much of his research has been based on insights from human speech perception and production. As Head of the UK Government's Speech Research Unit from 1985 to 1999, he was responsible for the development of the Aurix range of speech technology products and the subsequent formation of 20/20 Speech Ltd. Since 2004 he has been Professor of Spoken Language Processing at the University of Sheffield, and also holds Visiting Chairs at Bristol Robotics Laboratory and University College London Psychology & Language Sciences. He was President of the European/International Speech Communication Association from 1997 to 2001, General Chair for INTERSPEECH-2009 and ISCA Distinguished Lecturer during 2014-15. In 2017 he organised the first international workshop on ‘Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR)’. Prof. Moore is the current Editor-in-Chief of Computer Speech & Language and in 2016 he was awarded the LREC Antonio Zampoli Prize for "Outstanding Contributions to the Advancement of Language Resources & Language Technology Evaluation within Human Language Technologies” and in 2020 he was given the International Speech Communication Association Special Service Medal for "service in the establishment, leadership and international growth of ISCA".


Odette Scharenborg (Delft University of Technology)
Speech Representations and Processing in Deep Neural Networks (Thursday, January 13, 2022 - 15:00 CET)
Summary:

Abstract Speech recognition is the mapping of a continuous, highly variable speech signal onto discrete, abstract representations. The question of how speech is represented and processed in the human brain and in automatic speech recognition (ASR) systems, although crucial in both the field of human speech processing and the field of automatic speech processing, has historically been investigated in the two fields separately. This webinar will discuss how comparisons between humans and deep neural network (DNN)-based ASRs, and cross-fertilization of the two research fields, can provide valuable insights into the way humans process speech and improve ASR technology. Specifically, it will present results of several experiments carried out on both human listeners and DNN-based ASR systems on the representation of speech in human listeners and DNNs and on lexically-guided perceptual learning, i.e., the ability to adapt a sound category on the basis of new incoming information resulting in improved processing of subsequent information. It will explain how listeners adapt to the speech of new speakers, and will present the results of a lexically-guided perceptual study carried out on a DNN-based ASR system, similar to the human experiments. In order to investigate the speech representations and adaptation processes in the DNN-based ASR systems, activations in the hidden layers of the DNN were visualized. These visualizations revealed that DNNs use speech representations that are similar to those used by human listeners, without being explicitly taught to do so, and showed an adaptation of the phoneme categories similar to what is assumed happens in the human brain.


Bio:

Odette Scharenborg is an Associate Professor and Delft Technology Fellow at Delft University of Technology working on automatic speech processing. She has an interdisciplinary background in automatic speech recognition and psycholinguistics, and uses knowledge from how humans process speech in order to develop inclusive automatic speech recognition systems that are able to recognise speech from everyone, irrespective of how they speak or the language they speak. Since 2017, she is on the Board of the International Speech Communication Association, and currently serves as Vice-President. Since 2018, she is on the IEEE Speech and Language Processing Technical Committee, and she is a Senior Associate Editor of IEEE Signal Processing Letters.


Sam Bowman (New York University)
When Combating Hype, Proceed with Caution (Thursday, December 2, 2021 - 15:00 CET)
Summary:

Researchers in NLP increasingly frame and discuss research results in ways that serve to deemphasize the field's successes, at least in part in an effort to combat the field's widespread hype. Though well-meaning, this often yields misleading or even false claims about the limits of our best technology. This is a problem, and it may be more serious than it looks: It harms our credibility in ways that can make it harder to mitigate present-day harms, from NLP deployments, like those involving discriminatory systems for content moderation or resume screening. It also limits our ability to prepare for the potentially enormous impacts of more distant future advances. This talk urges researchers to be careful about these claims and suggests some research directions and communication strategies that will make it easier to avoid or rebut them.


Bio:

Sam Bowman has been on the faculty at NYU since 2016, when he completed PhD with Chris Manning and Chris Potts at Stanford. At NYU, he is a member of the Center for Data Science, the Department of Linguistics, and Courant Institute's Department of Computer Science. His research focuses on data, evaluation techniques, and modeling techniques for sentence and paragraph understanding in natural language processing, and on applications of machine learning to scientific questions in linguistic syntax and semantics. He is the senior organizer behind the GLUE and SuperGLUE benchmark competitions and he has received a 2015 EMNLP Best Resource Paper Award, a 2019 *SEM Best Paper Award, a 2017 Google Faculty Research Award, and a 2021 NSF CAREER award.


Hinrich Schuetze (University of Munich)
Humans Learn From Task Descriptions and So Should Our Models (Thursday, November 4, 2021 - 15:00 CET)
Summary:

Task descriptions are ubiquitous in human learning. They are usually accompanied by a few examples, but there is little human learning that is based on examples only. In contrast, the typical learning setup for NLP tasks lacks task descriptions and is supervised with 100s or 1000s and often many more examples. This webinar will introduce Pattern-Exploiting Training (PET), an approach to learning that mimics human learning in that it leverages task descriptions in few-shot settings. PET is built on top of a pretrained language model that "understands" the task description, especially after fine-tuning, resulting in excellent performance compared to other few-shot methods. In particular, a model trained with PET outperforms GPT-3 even though it has 99.9% fewer parameters. The idea of task descriptions can also be applied to reducing bias in text generated by language models. Instructing a model to reveal and reduce its biases is remarkably effective as will be demonstrated in an evaluation on several benchmarks. This may contribute in the future to a fairer and more inclusive NLP.


Bio:

.


2020-2021

Heidi Christensen (University of Sheffield, UK)
Automated processing of pathological speech (Thursday, June 3, 2021 - 15:00 CET)
Summary:

As speech technologies mature and become ever more pervasive, the opportunities for real impact for people increases. This talk will outline the major challenges faced by researchers in porting mainstream speech technology to the domain of healthcare applications; in particular, the need for personalised systems and the challenge of working in an inherently sparse data domain. Three areas in automatic processing of pathological speech will be covered: i) detection, ii) therapy/treatment and iii) facilitating communication. The talk will give an overview of recent state-of-the-art results and specific experiences from current projects at the University of Sheffield (UK)'s Speech and Hearing (SPandH) & Healthcare lab.


Bio:

.


Jose Luis Alba Castro - Carmen García Mateo (University of Vigo)
Automatic Spanish Sign-Language Recognition: On-going Work & Challenges Ahead (Thursday, May 6, 2021 - 15:00 CET)
Summary:

In this talk we will quickly review the general approaches followed by the research community to solve the Sign Language Recognition (SLR) problem in the pre-deep learning era and then review, also briefly, the latest architectures using DNNs. These data-hungry models pose a very important problem in this specific task due to the scarcity of labeled data. In the last 5 years there has been a great deal of effort on compiling labeled datasets of Word-Level SLR and Continuous-SLR, but we are still very far from the amount of data readily available for other speech-based tasks. Acquiring SLR has the double challenge of needing donors that are scarce and needing SLR interpreters that help with the logistics, curation and labeling of the dataset. The GTM group at the atlanTTic Center in the University of Vigo has started this research line three years ago. We will show the state of the project nowadays and the state of the dataset we are acquiring with the help of Galician deaf associations and SL interpreters. We will also show the different approaches we are following both for understanding manual and facial components of the sign language and the latest results on Word-Level SLR.


Bio:

.


Iryna Gurevych (Technische Universität Darmstadt)
Let's Argue - Understanding and Generating Natural Language Arguments (Thursday, March 4, 2021 - 15:00 CET)
Summary:

People love to argue. In recent years, Artificial Intelligence has achieved great advances in modelling natural language argumentation. While analysing and creating arguments is a highly complex (and enjoyable!) task at which even humans are not good, let alone perfect, we describe our natural language processing (NLP) research to identify arguments, their stance and aspects, aggregate arguments into topically coherent clusters, and finally, even to generate new arguments, given their desired topic, aspect and stance. The talk will tell you the story how the ArgumenText project has been conceptualized into a set of novel NLP tasks and highlight their main research outcomes. Argument mining has a tremendous number of possible applications, of which the talk discusses a few selected ones.


Bio:

.


Ricardo Baeza-Yates (Northeastern University)
Biases on Social Media (Thursday, February 11, 2021 - 15:00 CET)
Summary:

Is social media data representative? If not, what are their biases? Can we mitigate those biases and make them representative? Does all this depend on the language? Can word embeddings help? We will answer partially all these questions with concrete use cases.


Bio:

.



PDF
Kyunghyun Cho (NYU)
Unreasonably Shallow Deep Learning (Friday, January 29, 2021 - 17:30 CET)
Summary:

The talk will be about some gotcha's in Deep Learning.


Bio:

.


Eduard Hovy (CMU)
The Birth of a New NLP Centre: Making the Most of a Newborn Technology (Sunday, November 29, 2020 - 17:30 CET)
Summary:

Natural Language Processing (NLP) is at a very exciting time in its history. In the last 5 years a new technology has revolutionized the way we do our work. Even without special adaptation it tends to work better than almost every prior method, and yet we still don't really know how it works! So this is also a dangerous time: how can you trust a system that might (and sometimes does) do very strange things for which you can find no explanation or correction? In such a situation it is not a bad idea to look at the history of NLP, what NLP is at its core, and how the new technology fits into the NLP landscape. And, most importantly, where NLP is going (with or without this new technology) and how we can best prepare for it. The HiTZ Centre has a wonderful opportunity to help shape a future in which NLP will be as ubiquitous and as useful as the cellphone.


Bio:

.