Abstract: The consistency between the semantic information provided by the multi-modal reference and the tracked object is crucial for visual-language (VL) tracking. However, existing VL tracking ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results