Abstract: Recent advancements in large generative models, particularly diffusion-based methods, have significantly enhanced the capabilities of image editing. However, achieving precise control over ...
Abstract: Visual Question Answering (VQA) is a task that requires models to comprehend both questions and images. An increasing number of works are leveraging the strong reasoning capabilities of ...