Abstract: The knowledge-based visual question answering (KB-VQA) task involves using external knowledge about the image to assist reasoning. Building on the impressive performance of multimodal large ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results