Afsaneh Asaei

Model-based Sparse Component Analysis for Multiparty Distant Speech Recognition

Supervisor(s) and Committee member(s): Hervé Bourlard (supervisor)

URL: https://publidiap.idiap.ch/index.php/publications/show/2486

This thesis takes place in the context of multi-microphone distant speech recognition in multiparty meetings. It addresses the fundamental problem of overlapping speech recognition in reverberant rooms. Motivated from the excellent human hearing performance on such problem, possibly resulting of sparsity of the auditory representation, our work aims at exploiting sparse component analysis in speech recognition front-end to extract the components of the desired speaker from the competing interferences (other speakers) prior to recognition. More specifically, the speech recovery and recognition are achieved by sparse reconstruction of the (high-dimensional) spatio-spectral information embedded in the acoustic scene from (low-dimensional) compressive recordings provided by a few microphones. This approach exploits the natural parsimonious structure of the data pertained to the geometry of the problem as well as the information representation space.

Our contributions are articulated around four blocks. The structured sparse spatio-temporal representation of the concurrent sources is constituted along with the characterization of the compressive acoustic measurements. A framework to simultaneously identify the location of the sources and their spectral components is derived exploiting the model-based sparse recovery approach and, finally, the acoustic multipath and sparsity models are incorporated for effective multichannel signal acquisition relying on beamforming. This work is evaluated on real data recordings. The results provide compelling evidence of the effectiveness of structured sparsity models for multi-party speech recognition. It establishes a new perspective to the analysis of multichannel recordings as compressive acquisition and recovery of the information embedded in the acoustic scene.

Keywords: Model-based Sparse Component Analysis, Compressive Acoustic Measurements, Reverberant Enclosure, Structured Sparse Coding, Image Method, Distant Multiparty Speech Recognition, Overlapping Speech, Sparse Microphone Array, Multipath Sparse Beamforming

Idiap Research Institute

Bookmark the permalink.