Abstract: Pre-trained visual language models (VLMs) excel in various visual tasks due to the extensive knowledge they gain from large datasets of image-text pairs. The ability of VLMs to scale ...
This project investigates token quality from a noisy-label perspective and propose a generic token cleaning pipeline for SFT tasks. Our method filters out uninformative tokens while preserving those ...