Foundations of Speech Processing and Deep Learning
Neural Networks for Speech Processing
Deep Learning Training Paradigms
LABORATORY: Speech Representations
Recent progress in speech processing has been largely driven by deep learning and, more recently, by self-supervised and pre-trained models that learn rich representations from large amounts of audio samples. These advances have significantly improved the performance and robustness of systems across a wide range of tasks, from automatic speech recognition (or speech-to-text) to speaker recognition and diarization (multi-talker scenarios). Modern approaches rely on neural architectures and transferable representations (embeddings) that can be adapted to different scenarios, languages, and acoustic conditions.
This course is designed to provide a structured and practical introduction to these developments, guiding participants from the fundamentals of speech representations to the design of complete speech processing systems. It will progressively cover core tasks such as speech-to-text, speaker recognition, and speaker diarization, highlighting the role of embedding-based methods, end-to-end modeling, and evaluation methodologies. Emphasis will be placed on understanding how different components interact within real-world pipelines, as well as on the use of modern tools and pre-trained models to build specific systems. Throughout the course, participants will gain experience implementing and analyzing speech processing solutions, enabling them to apply these techniques in research or industrial settings.
The course is part of the NLP master hosted by the Ixa NLP research group at the HiTZ research center of the University of the Basque Country (UPV/EHU).