Abstract: Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech from the audio mixture given auxiliary visual cues. Previous methods usually search for the ...
Abstract: This paper introduces the first audio-visual dataset for traffic anomaly detection called MAVAD, taken from real-world scenes, with a diverse range of illumination conditions. In addition, a ...
In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in ...
Your brand is likely suffering from a critical sensory deficit. You spend thousands on visual identity—logos, colour palettes, typography, web design—yet you remain completely silent. In a digital ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results