Abstract: High-precision semantic segmentation of multimodal remote sensing imagery is critical for applications such as urban planning and land monitoring. However, existing methods often suffer from ...
Abstract: Pre-trained vision-language models (VLMs) and language models (LMs) have recently garnered significant attention due to their remarkable ability to represent textual concepts, opening up new ...