charlesdj commited on
Commit
337b2a7
·
verified ·
1 Parent(s): daf22e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -8
README.md CHANGED
@@ -11,20 +11,71 @@ datasets:
11
  base_model:
12
  - GAIR/Anole-7b-v0.1
13
  license: mit
14
- pipeline_tag: any-to-any
15
  ---
16
 
17
  # Omni-R1-Zero
18
 
19
- Omni-R1-Zero is trained without multimodal annotations. It bootstraps step-wise visualizations from text-only CoT seeds, then follows the SFT→RL recipe to learn interleaved multimodal reasoning.
 
 
20
 
21
- <p align="center">
22
- <a href="https://arxiv.org/abs/2601.09536"><b>Paper</b>👁️</a> ·
23
- <a href="https://github.com/ModalityDance/Omni-R1"><b>Code</b>🐙</a> ·
24
- <a href="https://huggingface.co/datasets/ModalityDance/Omni-Bench"><b>Omni-Bench</b>🧪</a>
25
- </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Citation
 
28
  ```bibtex
29
  @misc{cheng2026omnir1unifiedgenerativeparadigm,
30
  title={Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning},
@@ -35,4 +86,4 @@ Omni-R1-Zero is trained without multimodal annotations. It bootstraps step-wise
35
  primaryClass={cs.AI},
36
  url={https://arxiv.org/abs/2601.09536},
37
  }
38
- ```
 
11
  base_model:
12
  - GAIR/Anole-7b-v0.1
13
  license: mit
 
14
  ---
15
 
16
  # Omni-R1-Zero
17
 
18
+ [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?style=for-the-badge&logo=arxiv)](https://arxiv.org/abs/2601.09536)
19
+ [![Code](https://img.shields.io/badge/GitHub-Code-blue?style=for-the-badge&logo=github)](https://github.com/ModalityDance/Omni-R1)
20
+ [![Omni-Bench](https://img.shields.io/badge/Dataset-Omni--Bench-fcc21b?style=for-the-badge&logo=huggingface&logoColor=white)](https://huggingface.co/datasets/ModalityDance/Omni-Bench)
21
 
22
+ ## Overview
23
+
24
+ **Omni-R1-Zero** is trained **without multimodal annotations**. It bootstraps **step-wise visualizations** from **text-only CoT seeds** (e.g., M3CoT), and then follows the same **SFT → RL** recipe as Omni-R1 to learn interleaved multimodal reasoning.
25
+
26
+ ## Usage
27
+
28
+ ```python
29
+ import torch
30
+ from PIL import Image
31
+ from transformers import ChameleonProcessor, ChameleonForConditionalGeneration
32
+
33
+ # 1) Import & load
34
+ model_id = "ModalityDance/Omni-R1-Zero" # or a local checkpoint path
35
+ processor = ChameleonProcessor.from_pretrained(model_id)
36
+ model = ChameleonForConditionalGeneration.from_pretrained(
37
+ model_id,
38
+ torch_dtype=torch.bfloat16,
39
+ device_map="auto",
40
+ )
41
+ model.eval()
42
+
43
+ # 2) Prepare a single input
44
+ prompt = "You are a helpful assistant.\nUser: Which of these would appear shinier when polished? A. Metal spoon B. Wooden spoon\nThink with images first, the image reasoning process and answer are enclosed within <reserved12856> <reserved12857> and <reserved12866> <reserved12867> XML tags, respectively.\nAssistant:"
45
+
46
+ inputs = processor(
47
+ prompt,
48
+ padding=False,
49
+ return_for_text_completion=True,
50
+ return_tensors="pt",
51
+ ).to(model.device)
52
+
53
+ # 3) Call the model
54
+ outputs = model.generate(
55
+ **inputs,
56
+ max_length=4096,
57
+ do_sample=True,
58
+ temperature=1.0,
59
+ top_p=0.9,
60
+ pad_token_id=1,
61
+ multimodal_generation_mode="unrestricted",
62
+ )
63
+
64
+ # 4) Get results
65
+ text = processor.batch_decode(outputs, skip_special_tokens=False)[0]
66
+ print(text)
67
+ ```
68
+
69
+ For full scripts (batch JSONL inference, interleaved decoding, and vLLM-based evaluation), please refer to the official GitHub repository:
70
+ https://github.com/ModalityDance/Omni-R1
71
+
72
+ ## License
73
+
74
+ This project is licensed under the **MIT License**.
75
+ It also complies with the licenses of referenced third-party projects and dependencies, including the **Chameleon Research License**.
76
 
77
  ## Citation
78
+
79
  ```bibtex
80
  @misc{cheng2026omnir1unifiedgenerativeparadigm,
81
  title={Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning},
 
86
  primaryClass={cs.AI},
87
  url={https://arxiv.org/abs/2601.09536},
88
  }
89
+ ```