Abstract: Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech from the audio mixture given auxiliary visual cues. Previous methods usually search for the ...
In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in ...
Abstract: Audio-visual zero-shot learning (ZSL) leverages both video and audio information for model training, aiming to classify new video categories that were not seen during the training. However, ...
So, you’re looking for a new headset for your gaming setup in 2025? It can be a bit much trying to figure out what’s actually good and what’s just hype. We’ve been checking out a bunch of gaming ...
Have you ever wondered if your go-to tools might be holding you back? For millions of developers, Visual Studio Code (VS Code) is the undisputed champion of code editors, celebrated for its ...
This paper will be presented as an oral paper at the ICASSP Audio for Multimedia and Multimodal Processing Session at 6/6/2023 10:50:00 (Eastern European Summer Time). Please cite our paper if you ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results