luodian commited on
Commit
beaeed0
·
verified ·
1 Parent(s): 4a0f69a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -7,7 +7,7 @@ license: apache-2.0
7
  MMSearch-R1-7B is a search-augmented LMM trained with end-to-end reinforcement learning, equipped with the ability to invoke multimodal search tools on demand. The model can dynamically decide whether to perform image or text search based on the question and integrate the retrieved external information into its reasoning process, enabling more accurate answers for knowledge-intensive VQA tasks. For more details on the training process and model evaluation, please refer to the [blog](https://www.lmms-lab.com/posts/mmsearch_r1/) or the [paper](https://arxiv.org/abs/2506.20670).
8
 
9
  ### Model Details
10
- - Model name: MMSearch-R1-7B
11
  - Architecture: Qwen2.5-VL-7B base model, fine-tuned with Reinforcement Learning (GRPO)
12
  - Model type: Multimodal Large Language Model with Search-Augmentation
13
  - Languages: English(primary), multilingual(partially)
@@ -15,6 +15,24 @@ MMSearch-R1-7B is a search-augmented LMM trained with end-to-end reinforcement l
15
  - Paper: [MMSearch-R1: Incentivizing LMMs to Search](https://arxiv.org/abs/2506.20670)
16
  - Code: [EvolvingLMMs-Lab/multimodal-search-r1](https://github.com/EvolvingLMMs-Lab/multimodal-search-r1)
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ### Training Details
19
  - Dataset: [FVQA-train](https://huggingface.co/datasets/lmms-lab/FVQA)
20
  - RL Framework: [veRL](https://github.com/volcengine/verl)
 
7
  MMSearch-R1-7B is a search-augmented LMM trained with end-to-end reinforcement learning, equipped with the ability to invoke multimodal search tools on demand. The model can dynamically decide whether to perform image or text search based on the question and integrate the retrieved external information into its reasoning process, enabling more accurate answers for knowledge-intensive VQA tasks. For more details on the training process and model evaluation, please refer to the [blog](https://www.lmms-lab.com/posts/mmsearch_r1/) or the [paper](https://arxiv.org/abs/2506.20670).
8
 
9
  ### Model Details
10
+ - Model name: MMSearch-R1-7B-0807
11
  - Architecture: Qwen2.5-VL-7B base model, fine-tuned with Reinforcement Learning (GRPO)
12
  - Model type: Multimodal Large Language Model with Search-Augmentation
13
  - Languages: English(primary), multilingual(partially)
 
15
  - Paper: [MMSearch-R1: Incentivizing LMMs to Search](https://arxiv.org/abs/2506.20670)
16
  - Code: [EvolvingLMMs-Lab/multimodal-search-r1](https://github.com/EvolvingLMMs-Lab/multimodal-search-r1)
17
 
18
+ ### Updated Model Performance
19
+
20
+ | Models | MMK12 | MathVerse (testmini) | MathVision (testmini) | MathVista (testmini) | MMMU (val) | AI2D | ChartQA | MME | RealworldQA | OCRBench | DocVQA | MMBench | MMStar | MiaBench |
21
+ |--------|-------|----------------------|----------------------|----------------------|------------|------|---------|-----|-------------|----------|--------|---------|--------|----------|
22
+ | Qwen2.5-VL-7B | 34.4 | 46.2 | 24.0 | 66.6 | 49.8 | 93.3 | 94.4 | 630.4/1685.2 | 68.5 | 85.2 | 94.6 | 82.9 | 62.6 | 81.7 |
23
+ | General STEM | 46.2 | 51.4 | 28.4 | 73.6 | 57.3 | 94.4 | 91.4 | 700.7/1662.1 | 67.5 | 83.7 | 92.1 | 83.8 | 65.5 | 76.0 |
24
+ | Reason -> Search | 43.2 | 51.7 | 25.0 | 71.8 | 57.9 | 94.0 | 93.6 | 652.5/1688.3 | 67.5 | 81.7 | 93.5 | 83.2 | 63.1 | 47.6 |
25
+ | General Search | 43.6 | 52.0 | 27.3 | 74.7 | 56.1 | 94.6 | 94.0 | 718.9/1775.3 | 65.5 | 77.8 | 89.4 | 84.0 | 60.4 | 44.4 |
26
+
27
+ ---
28
+
29
+ | Models | Infoseek | MMSearch | FVQA | SimpleVQA |
30
+ |--------|----------|----------|------|-----------|
31
+ | Qwen2.5-VL-7B | 20.1 | 12.8 | 20.3 | 38.4 |
32
+ | MMSearch | 55.1 | 53.8 | 58.4 | 57.4 |
33
+ | Reasoning -> Search | 58.5 | 57.1 | 57.9 | 57.7 |
34
+ | General Search | 52.0 | 54.9 | 52.8 | 57.0 |
35
+
36
  ### Training Details
37
  - Dataset: [FVQA-train](https://huggingface.co/datasets/lmms-lab/FVQA)
38
  - RL Framework: [veRL](https://github.com/volcengine/verl)