Abstract: Audio large language models (LLMs) are considered experts at recognizing sound objects, yet their performance relative to LLMs in other sensory modalities, such as visual or audio-visual ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results