Document Object Model

Maritime Small Object Detection Algorithm in Drone Aerial Images Based on Improved YOLOv8

Abstract: Combining unmanned aerial vehicles (UAVs) with deep learning algorithms offers an efficient, safe and inexpensive alternative to maritime search and rescue (mSAR) missions. Maritime UAV ...

Document Intelligence as Core Financial Infrastructure

Document intelligence is no longer a feature; it is infrastructure. In payments, lending, and digital banking, documents ...

IEEE

SamPose: Generalizable Model-Free 6D Object Pose Estimation via Single-View Prompt

Abstract: Object pose estimation in open-world scenarios is a critical challenge in robotics, virtual reality, and autonomous driving. In this letter, we introduce SamPose, a novel framework designed ...

Microsoft

DocReward: A Document Reward Model for Structuring and Stylizing

Recent advances in agentic workflows have enabled the automation of tasks such as professional document generation. However, they primarily focus on textual quality, neglecting visual structure and ...

GitHub

DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

[2024/12] Code release: Inferece, Diffusion sampling, Pretrained model. [2024/10] DifFUSER is presented at ECCV 2024. [2024/07] DifFUSER is accepted by ECCV 2024. This repository contains the official ...

Gizmodo

Anthropic Accidentally Gives the World a Peek Into Its Model’s ‘Soul’

Artificial intelligence models don’t have souls, but one of them does apparently have a “soul” document. A person named Richard Weiss was able to get Anthropic’s latest large language model, Claude ...

GitHub

Moshi: a speech-text foundation model for real time dialogue

Finally, the code for the web UI client used in the Moshi demo is provided in the client/ directory. If you want to fine tune Moshi, head out to kyutai-labs/moshi ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results