metadata
license: mit
library_name: transformers
pipeline_tag: image-feature-extraction
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
OmniStream is a unified streaming visual backbone that effectively perceives, reconstructs, and acts from diverse visual inputs. By incorporating causal spatiotemporal attention and 3D rotary positional embeddings (3D-RoPE), the model supports efficient, frame-by-frame online processing of video streams via a persistent KV-cache.
- Paper: OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
- Project Page: https://go2heart.github.io/omnistream/
- Repository: https://github.com/Go2Heart/OmniStream
Sample Usage
The following code snippet demonstrates how to use OmniStream for feature extraction. Note that this requires the model.py file from the official repository to be present in your environment.
from model import OmnistreamMultiFrameTransformer
from transformers import AutoImageProcessor
import torch
import numpy as np
# Load processor and model
processor = AutoImageProcessor.from_pretrained("StreamFormer/OmniStream")
model = OmnistreamMultiFrameTransformer.from_pretrained("StreamFormer/OmniStream").to("cuda")
model.eval()
# Prepare dummy input: 16 frames of 512x512 RGB images (Batch x Time, Height, Width, Channels)
fake_pixel = np.random.randn(16, 512, 512, 3)
fake_input = processor(images=fake_pixel, return_tensors="pt").to("cuda")
# Reshape to (Batch, Time, Channels, Height, Width)
fake_input["pixel_values"] = fake_input["pixel_values"].unsqueeze(0).float()
with torch.no_grad():
output = model(**fake_input, return_dict=True)
print(output.keys())
print(output["last_hidden_state"].shape) # last layer's hidden states
print(output["pooler_output"].shape) # cls token
print(output["patch_start_idx"]) # index of the first patch of each frame
Citation
@article{yan2026omnistream,
title={OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams},
author={Yibin Yan and Jilan Xu and Shangzhe Di and Haoning Wu and Weidi Xie},
journal={arXiv preprint arXiv:2603.12265},
year={2026},
url={https://arxiv.org/abs/2603.12265}
}