Abstract: Visual Dialog is a typical AI-agent task on images, in which the agent interprets information from heterogeneous modalities and provides the correct answer. In this area, most approaches are ...
Abstract: Visual behavior depends on both bottom-up mechanisms, where gaze is driven by the visual conspicuity of the stimuli, and top-down mechanisms, guiding attention towards relevant areas based ...