nielsr HF Staff commited on
Commit
41de125
·
verified ·
1 Parent(s): 6dea0bd

Improve model card: add metadata, links and sample usage

Browse files

Hi! I'm Niels from the community science team at Hugging Face. I've opened this PR to improve the documentation for OmniStream.

The changes include:
- Adding metadata for better discoverability (`pipeline_tag`, `library_name`, and `license`).
- Adding links to the research paper, project page, and official GitHub repository.
- Including a sample usage snippet derived from your official README to help users get started with the model.

Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: image-feature-extraction
5
+ ---
6
+
7
+ # OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
8
+
9
+ OmniStream is a unified streaming visual backbone that effectively perceives, reconstructs, and acts from diverse visual inputs. By incorporating causal spatiotemporal attention and 3D rotary positional embeddings (3D-RoPE), the model supports efficient, frame-by-frame online processing of video streams via a persistent KV-cache.
10
+
11
+ - **Paper:** [OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams](https://huggingface.co/papers/2603.12265)
12
+ - **Project Page:** [https://go2heart.github.io/omnistream/](https://go2heart.github.io/omnistream/)
13
+ - **Repository:** [https://github.com/Go2Heart/OmniStream](https://github.com/Go2Heart/OmniStream)
14
+
15
+ ## Sample Usage
16
+
17
+ The following code snippet demonstrates how to use OmniStream for feature extraction. Note that this requires the `model.py` file from the official repository to be present in your environment.
18
+
19
+ ```python
20
+ from model import OmnistreamMultiFrameTransformer
21
+ from transformers import AutoImageProcessor
22
+ import torch
23
+ import numpy as np
24
+
25
+ # Load processor and model
26
+ processor = AutoImageProcessor.from_pretrained("StreamFormer/OmniStream")
27
+ model = OmnistreamMultiFrameTransformer.from_pretrained("StreamFormer/OmniStream").to("cuda")
28
+
29
+ model.eval()
30
+
31
+ # Prepare dummy input: 16 frames of 512x512 RGB images (Batch x Time, Height, Width, Channels)
32
+ fake_pixel = np.random.randn(16, 512, 512, 3)
33
+ fake_input = processor(images=fake_pixel, return_tensors="pt").to("cuda")
34
+
35
+ # Reshape to (Batch, Time, Channels, Height, Width)
36
+ fake_input["pixel_values"] = fake_input["pixel_values"].unsqueeze(0).float()
37
+
38
+ with torch.no_grad():
39
+ output = model(**fake_input, return_dict=True)
40
+
41
+ print(output.keys())
42
+ print(output["last_hidden_state"].shape) # last layer's hidden states
43
+ print(output["pooler_output"].shape) # cls token
44
+ print(output["patch_start_idx"]) # index of the first patch of each frame
45
+ ```
46
+
47
+ ## Citation
48
+
49
+ ```bibtex
50
+ @article{yan2026omnistream,
51
+ title={OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams},
52
+ author={Yibin Yan and Jilan Xu and Shangzhe Di and Haoning Wu and Weidi Xie},
53
+ journal={arXiv preprint arXiv:2603.12265},
54
+ year={2026},
55
+ url={https://arxiv.org/abs/2603.12265}
56
+ }
57
+ ```