Abstract: Adapting Vision Transformers (ViTs) for medical imaging is constrained by the scarcity of data and high-quality annotations, hindering effective training and robust generalization. Visual ...