Update README.md
Browse files
README.md
CHANGED
|
@@ -22,6 +22,7 @@ tags:
|
|
| 22 |
<p>
|
| 23 |
<strong>LongVILA-R1-7B</strong> supports both <u>multiple-choice</u> questions and <u>open-ended</u> questions. It can switch between thinking and non-thinking modes.<br>
|
| 24 |
<strong>LongVILA-R1-7B</strong> demonstrates strong performance in long video reasoning, achieving <strong>70.7%</strong> on VideoMME (w/ sub.) and surpassing Gemini-1.5-Pro across diverse reasoning tasks.<br>
|
|
|
|
| 25 |
<strong>Long-RL</strong> is a codebase that accelerates long video RL training by up to <strong>2.1×</strong> through its MR-SP system. It supports RL training on image, video, and omni inputs across VILA, Qwen/Qwen-VL, and diffusion models.
|
| 26 |
</p>
|
| 27 |
|
|
@@ -48,6 +49,9 @@ from transformers import AutoModel
|
|
| 48 |
model_path = "Efficient-Large-Model/LongVILA-R1-7B"
|
| 49 |
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map="auto")
|
| 50 |
|
|
|
|
|
|
|
|
|
|
| 51 |
use_thinking = True # Switching between thinking and non-thinking modes
|
| 52 |
system_prompt_thinking = "You are a helpful assistant. The user asks a question, and then you solves it.\n\nPlease first think deeply about the question based on the given video, and then provide the final answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\n\n Question: {question}"
|
| 53 |
|
|
|
|
| 22 |
<p>
|
| 23 |
<strong>LongVILA-R1-7B</strong> supports both <u>multiple-choice</u> questions and <u>open-ended</u> questions. It can switch between thinking and non-thinking modes.<br>
|
| 24 |
<strong>LongVILA-R1-7B</strong> demonstrates strong performance in long video reasoning, achieving <strong>70.7%</strong> on VideoMME (w/ sub.) and surpassing Gemini-1.5-Pro across diverse reasoning tasks.<br>
|
| 25 |
+
<strong>LongVILA-R1-7B</strong> supports processing up to <strong>2,048</strong> video frames per video, with configurable FPS settings.<br>
|
| 26 |
<strong>Long-RL</strong> is a codebase that accelerates long video RL training by up to <strong>2.1×</strong> through its MR-SP system. It supports RL training on image, video, and omni inputs across VILA, Qwen/Qwen-VL, and diffusion models.
|
| 27 |
</p>
|
| 28 |
|
|
|
|
| 49 |
model_path = "Efficient-Large-Model/LongVILA-R1-7B"
|
| 50 |
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map="auto")
|
| 51 |
|
| 52 |
+
# You can disable FPS control by setting it to 0, and customize the number of processed num_video_frames as desired.
|
| 53 |
+
#model.config.num_video_frames, model.config.fps = 512, 0
|
| 54 |
+
|
| 55 |
use_thinking = True # Switching between thinking and non-thinking modes
|
| 56 |
system_prompt_thinking = "You are a helpful assistant. The user asks a question, and then you solves it.\n\nPlease first think deeply about the question based on the given video, and then provide the final answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\n\n Question: {question}"
|
| 57 |
|