Abstract: Recently, audio-visual speech recognition has attracted increasing attention. However, most existing works only focused on scenarios with two speakers. In this work, we study the effect of ...
Abstract: Audio-Visual Speech Recognition (AVSR) combines lip-based video with audio and can improve performance in noise, but most methods are trained only on English data. One limitation is the lack ...