Video-Text-to-Text
Transformers
Safetensors
qwen2_5_omni
multimodal
video-understanding
audio-understanding
streaming
real-time
omni-modal
Instructions to use EurekaTian/ROMA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EurekaTian/ROMA with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("EurekaTian/ROMA") model = AutoModel.from_pretrained("EurekaTian/ROMA") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -26,4 +26,4 @@ ROMA introduces a "Speak Head" mechanism to decouple response timing from conten
|
|
| 26 |
|
| 27 |
- **Paper:** [ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding](https://arxiv.org/abs/250x.xxxxx)
|
| 28 |
- **Project Page:** [Link](https://eureka-maggie.github.io/ROMA_show/)
|
| 29 |
-
- **Repository:** [[Github (Coming Soon)](https://github.com/Eureka-Maggie/
|
|
|
|
| 26 |
|
| 27 |
- **Paper:** [ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding](https://arxiv.org/abs/250x.xxxxx)
|
| 28 |
- **Project Page:** [Link](https://eureka-maggie.github.io/ROMA_show/)
|
| 29 |
+
- **Repository:** [[Github (Coming Soon)](https://github.com/Eureka-Maggie/ROMA)]
|