Yukang commited on
Commit
7b163bc
·
verified ·
1 Parent(s): 8f2be92

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -22,6 +22,7 @@ tags:
22
  <p>
23
  <strong>LongVILA-R1-7B</strong> supports both <u>multiple-choice</u> questions and <u>open-ended</u> questions. It can switch between thinking and non-thinking modes.<br>
24
  <strong>LongVILA-R1-7B</strong> demonstrates strong performance in long video reasoning, achieving <strong>70.7%</strong> on VideoMME (w/ sub.) and surpassing Gemini-1.5-Pro across diverse reasoning tasks.<br>
 
25
  <strong>Long-RL</strong> is a codebase that accelerates long video RL training by up to <strong>2.1×</strong> through its MR-SP system. It supports RL training on image, video, and omni inputs across VILA, Qwen/Qwen-VL, and diffusion models.
26
  </p>
27
 
@@ -48,6 +49,9 @@ from transformers import AutoModel
48
  model_path = "Efficient-Large-Model/LongVILA-R1-7B"
49
  model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map="auto")
50
 
 
 
 
51
  use_thinking = True # Switching between thinking and non-thinking modes
52
  system_prompt_thinking = "You are a helpful assistant. The user asks a question, and then you solves it.\n\nPlease first think deeply about the question based on the given video, and then provide the final answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\n\n Question: {question}"
53
 
 
22
  <p>
23
  <strong>LongVILA-R1-7B</strong> supports both <u>multiple-choice</u> questions and <u>open-ended</u> questions. It can switch between thinking and non-thinking modes.<br>
24
  <strong>LongVILA-R1-7B</strong> demonstrates strong performance in long video reasoning, achieving <strong>70.7%</strong> on VideoMME (w/ sub.) and surpassing Gemini-1.5-Pro across diverse reasoning tasks.<br>
25
+ <strong>LongVILA-R1-7B</strong> supports processing up to <strong>2,048</strong> video frames per video, with configurable FPS settings.<br>
26
  <strong>Long-RL</strong> is a codebase that accelerates long video RL training by up to <strong>2.1×</strong> through its MR-SP system. It supports RL training on image, video, and omni inputs across VILA, Qwen/Qwen-VL, and diffusion models.
27
  </p>
28
 
 
49
  model_path = "Efficient-Large-Model/LongVILA-R1-7B"
50
  model = AutoModel.from_pretrained(model_path, trust_remote_code=True, device_map="auto")
51
 
52
+ # You can disable FPS control by setting it to 0, and customize the number of processed num_video_frames as desired.
53
+ #model.config.num_video_frames, model.config.fps = 512, 0
54
+
55
  use_thinking = True # Switching between thinking and non-thinking modes
56
  system_prompt_thinking = "You are a helpful assistant. The user asks a question, and then you solves it.\n\nPlease first think deeply about the question based on the given video, and then provide the final answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.\n\n Question: {question}"
57