Vision Transformer Code

LightFormer: Vision Transformer for Lightweight Based on Cascade Depthwise Convolution and Mixed Attention

Abstract: In recent years, the complementary advantages of convolutional neural networks (CNNs) and Transformers have been utilized to achieve significant results in image classification tasks.

IEEE

BinaryViT: Binary Vision Transformer for Hyperspectral Image Classification

Abstract: Vision transformers have demonstrated remarkable performance in hyperspectral image classification tasks. However, their complex computational mechanisms and excessive parameterization ...

GitHub

Open Vision Agents by Stream

Multi-modal AI agents that watch, listen, and understand video. Vision Agents give you the building blocks to create intelligent, low-latency video experiences powered by your models, your ...

GitHub

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

MemoryVLA is a Cognition-Memory-Action framework for robotic manipulation inspired by human memory systems. It builds a hippocampal-like perceptual-cognitive memory to capture the temporal ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results