Abstract: Pre-trained visual language models (VLMs) excel in various visual tasks due to the extensive knowledge they gain from large datasets of image-text pairs. The ability of VLMs to scale ...
This project investigates token quality from a noisy-label perspective and propose a generic token cleaning pipeline for SFT tasks. Our method filters out uninformative tokens while preserving those ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results