Image-Text-to-Text
Transformers
TensorBoard
Safetensors
feature-extraction
conversational
custom_code
Yin-Xie commited on
Commit
5351272
Β·
verified Β·
1 Parent(s): cc55440

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -17,7 +17,7 @@ base_model:
17
  A family of fully open-source large multimodal models demonstrating **superior performance** across multiple multimodal benchmarks, **outperforming Qwen2.5-VL** in most evaluation tasks.
18
 
19
  2. **High-Quality Data at Scale**
20
- Meticulously curated **pre-training and SFT data** with rigorous filtering and quality control, achieving **superior data efficiency** with only **5B tokens** (1.2% of Qwen2.5-VL's training data).
21
  - Concept-balanced, highly diverse, high-quality caption data
22
  - Comprehensive instruction fine-tuning data covering a wide range of tasks
23
 
@@ -29,7 +29,7 @@ Complete end-to-end training framework designed for maximum efficiency:
29
  - Optimized codebase for cost-effective scaling
30
 
31
  4. **Fully Open Framework** for community access and reproducibility:
32
- - βœ… High-quality pre-training & SFT data
33
  - βœ… Complete training framework & code
34
  - βœ… Training recipes & configurations
35
  - βœ… Base & instruct model checkpoints
@@ -38,7 +38,7 @@ Complete end-to-end training framework designed for maximum efficiency:
38
  ## Dataset
39
  | Description | Link |
40
  |-------------|------|
41
- | Pretrain data for LLaVA-OneVision-1.5 | [πŸ€— Download (Uploading!)](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M) |
42
  | SFT data for LLaVA-OneVision-1.5 | [πŸ€— Download (Uploading!)](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Insturct-26M) |
43
 
44
  ## Evaluation Results
 
17
  A family of fully open-source large multimodal models demonstrating **superior performance** across multiple multimodal benchmarks, **outperforming Qwen2.5-VL** in most evaluation tasks.
18
 
19
  2. **High-Quality Data at Scale**
20
+ Meticulously curated **mid-training and SFT data** with rigorous filtering and quality control, achieving **superior data efficiency** with only **5B tokens** (1.2% of Qwen2.5-VL's training data).
21
  - Concept-balanced, highly diverse, high-quality caption data
22
  - Comprehensive instruction fine-tuning data covering a wide range of tasks
23
 
 
29
  - Optimized codebase for cost-effective scaling
30
 
31
  4. **Fully Open Framework** for community access and reproducibility:
32
+ - βœ… High-quality mid-training & SFT data
33
  - βœ… Complete training framework & code
34
  - βœ… Training recipes & configurations
35
  - βœ… Base & instruct model checkpoints
 
38
  ## Dataset
39
  | Description | Link |
40
  |-------------|------|
41
+ | Mid-training data for LLaVA-OneVision-1.5 | [πŸ€— Download (Uploading!)](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M) |
42
  | SFT data for LLaVA-OneVision-1.5 | [πŸ€— Download (Uploading!)](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Insturct-26M) |
43
 
44
  ## Evaluation Results