Improve model card: add metadata, links and sample usage
Browse filesHi! I'm Niels from the community science team at Hugging Face. I've opened this PR to improve the documentation for OmniStream.
The changes include:
- Adding metadata for better discoverability (`pipeline_tag`, `library_name`, and `license`).
- Adding links to the research paper, project page, and official GitHub repository.
- Including a sample usage snippet derived from your official README to help users get started with the model.
README.md
CHANGED
|
@@ -1,3 +1,57 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: image-feature-extraction
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams
|
| 8 |
+
|
| 9 |
+
OmniStream is a unified streaming visual backbone that effectively perceives, reconstructs, and acts from diverse visual inputs. By incorporating causal spatiotemporal attention and 3D rotary positional embeddings (3D-RoPE), the model supports efficient, frame-by-frame online processing of video streams via a persistent KV-cache.
|
| 10 |
+
|
| 11 |
+
- **Paper:** [OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams](https://huggingface.co/papers/2603.12265)
|
| 12 |
+
- **Project Page:** [https://go2heart.github.io/omnistream/](https://go2heart.github.io/omnistream/)
|
| 13 |
+
- **Repository:** [https://github.com/Go2Heart/OmniStream](https://github.com/Go2Heart/OmniStream)
|
| 14 |
+
|
| 15 |
+
## Sample Usage
|
| 16 |
+
|
| 17 |
+
The following code snippet demonstrates how to use OmniStream for feature extraction. Note that this requires the `model.py` file from the official repository to be present in your environment.
|
| 18 |
+
|
| 19 |
+
```python
|
| 20 |
+
from model import OmnistreamMultiFrameTransformer
|
| 21 |
+
from transformers import AutoImageProcessor
|
| 22 |
+
import torch
|
| 23 |
+
import numpy as np
|
| 24 |
+
|
| 25 |
+
# Load processor and model
|
| 26 |
+
processor = AutoImageProcessor.from_pretrained("StreamFormer/OmniStream")
|
| 27 |
+
model = OmnistreamMultiFrameTransformer.from_pretrained("StreamFormer/OmniStream").to("cuda")
|
| 28 |
+
|
| 29 |
+
model.eval()
|
| 30 |
+
|
| 31 |
+
# Prepare dummy input: 16 frames of 512x512 RGB images (Batch x Time, Height, Width, Channels)
|
| 32 |
+
fake_pixel = np.random.randn(16, 512, 512, 3)
|
| 33 |
+
fake_input = processor(images=fake_pixel, return_tensors="pt").to("cuda")
|
| 34 |
+
|
| 35 |
+
# Reshape to (Batch, Time, Channels, Height, Width)
|
| 36 |
+
fake_input["pixel_values"] = fake_input["pixel_values"].unsqueeze(0).float()
|
| 37 |
+
|
| 38 |
+
with torch.no_grad():
|
| 39 |
+
output = model(**fake_input, return_dict=True)
|
| 40 |
+
|
| 41 |
+
print(output.keys())
|
| 42 |
+
print(output["last_hidden_state"].shape) # last layer's hidden states
|
| 43 |
+
print(output["pooler_output"].shape) # cls token
|
| 44 |
+
print(output["patch_start_idx"]) # index of the first patch of each frame
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Citation
|
| 48 |
+
|
| 49 |
+
```bibtex
|
| 50 |
+
@article{yan2026omnistream,
|
| 51 |
+
title={OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams},
|
| 52 |
+
author={Yibin Yan and Jilan Xu and Shangzhe Di and Haoning Wu and Weidi Xie},
|
| 53 |
+
journal={arXiv preprint arXiv:2603.12265},
|
| 54 |
+
year={2026},
|
| 55 |
+
url={https://arxiv.org/abs/2603.12265}
|
| 56 |
+
}
|
| 57 |
+
```
|