Abstract: Document Image Translation (DIT) aims to translate documents in images from one language to another. It is a multi-modal task that involves the cooperation of text, visual layout, and ...
Abstract: Multi-modal data feature fusion can effectively improve the accuracy of primary modal pattern recognition and address the issue of missing data through multi-modal collaboration. To some ...