Abstract: Scene Knowledge-guided Visual Grounding (SK-VG) aims to locate the specific object in an image that is referred to by an open-ended query, utilizing textual scene knowledge for guidance.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results