MiniCPM-V 2.6 🤗 🤖 | MiniCPM-Llama3-V 2.5 🤗 🤖 | MiniCPM-Llama3-V 2.5 技术报告
Model | Size | Token Density+ | OpenCompass | MME | MMVet | OCRBench | MMMU val | MathVista mini | MMB1.1 test | AI2D | TextVQA val | DocVQA test | HallusionBench | Object HalBench |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Proprietary | ||||||||||||||
GPT-4o | - | 1088 | 69.9 | 2328.7 | 69.1 | 736 | 69.2 | 61.3 | 82.2 | 84.6 | - | 92.8 | 55.0 | 17.6 |
Claude 3.5 Sonnet | - | 750 | 67.9 | 1920.0 | 66.0 | 788 | 65.9 | 61.6 | 78.5 | 80.2 | - | 95.2 | 49.9 | 13.8 |
Gemini 1.5 Pro | - | - | 64.4 | 2110.6 | 64.0 | 754 | 60.6 | 57.7 | 73.9 | 79.1 | 73.5 | 86.5 | 45.6 | - |
GPT-4o mini | - | 1088 | 64.1 | 2003.4 | 66.9 | 785 | 60.0 | 52.4 | 76.0 | 77.8 | - | - | 46.1 | 12.4 |
GPT-4V | - | 1088 | 63.5 | 2070.2 | 67.5 | 656 | 61.7 | 54.7 | 79.8 | 78.6 | 78.0 | 87.2 | 43.9 | 14.2 |
Step-1V | - | - | 59.5 | 2206.4 | 63.3 | 625 | 49.9 | 44.8 | 78.0 | 79.2 | 71.6 | - | 48.4 | - |
Qwen-VL-Max | - | 784 | 58.3 | 2281.7 | 61.8 | 684 | 52.0 | 43.4 | 74.6 | 75.7 | 79.5 | 93.1 | 41.2 | 13.4 |
Open-source | ||||||||||||||
LLaVA-NeXT-Yi-34B | 34B | 157 | 55.0 | 2006.5 | 50.7 | 574 | 48.8 | 40.4 | 77.8 | 78.9 | 69.3 | - | 34.8 | 12.6 |
Mini-Gemini-HD-34B | 34B | 157 | - | 2141 | 59.3 | 518 | 48.0 | 43.3 | - | 80.5 | 74.1 | 78.9 | - | - |
Cambrian-34B | 34B | 1820 | 58.3 | 2049.9 | 53.2 | 591 | 50.4 | 50.3 | 77.8 | 79.5 | 76.7 | 75.5 | 41.6 | 14.7 |
GLM-4V-9B | 13B | 784 | 59.1 | 2018.8 | 58.0 | 776 | 46.9 | 51.1 | 67.9 | 71.2 | - | - | 45.0 | - |
InternVL2-8B | 8B | 706 | 64.1 | 2215.1 | 54.3 | 794 | 51.2 | 58.3 | 79.4 | 83.6 | 77.4 | 91.6 | 45.0 | 21.3 |
MiniCPM-Llama-V 2.5 | 8B | 1882 | 58.8 | 2024.6 | 52.8 | 725 | 45.8 | 54.3 | 72.0 | 78.4 | 76.6 | 84.8 | 42.4 | 10.3 |
MiniCPM-V 2.6 | 8B | 2822 | 65.2 | 2348.4* | 60.0 | 852* | 49.8* | 60.6 | 78.0 | 82.1 | 80.1 | 90.8 | 48.1* | 8.2 |
Model | Size | Mantis Eval | BLINK val | Mathverse mv | Sciverse mv | MIRB |
---|---|---|---|---|---|---|
Proprietary | ||||||
GPT-4V | - | 62.7 | 54.6 | 60.3 | 66.9 | 53.1 |
LLaVA-NeXT-Interleave-14B | 14B | 66.4 | 52.6 | 32.7 | 30.2 | - |
Open-source | ||||||
Emu2-Chat | 37B | 37.8 | 36.2 | - | 27.2 | - |
CogVLM | 17B | 45.2 | 41.1 | - | - | - |
VPG-C | 7B | 52.4 | 43.1 | 24.3 | 23.1 | - |
VILA 8B | 8B | 51.2 | 39.3 | - | 36.5 | - |
InternLM-XComposer-2.5 | 8B | 53.1* | 48.9 | 32.1* | - | 42.5 |
InternVL2-8B | 8B | 59.0* | 50.9 | 30.5* | 34.4* | 56.9* |
MiniCPM-V 2.6 | 8B | 69.1 | 53.0 | 84.9 | 74.9 | 53.8 |
Model | Size | Video-MME | Video-ChatGPT | |||||
---|---|---|---|---|---|---|---|---|
w/o subs | w subs | Correctness | Detail | Context | Temporal | Consistency | ||
Proprietary | ||||||||
Claude 3.5 Sonnet | - | 60.0 | 62.9 | - | - | - | - | - |
GPT-4V | - | 59.9 | 63.3 | - | - | - | - | - |
Open-source | ||||||||
LLaVA-NeXT-7B | 7B | - | - | 3.39 | 3.29 | 3.92 | 2.60 | 3.12 |
LLaVA-NeXT-34B | 34B | - | - | 3.29 | 3.23 | 3.83 | 2.51 | 3.47 |
CogVLM2-Video | 12B | - | - | 3.49 | 3.46 | 3.23 | 2.98 | 3.64 |
LongVA | 7B | 52.4 | 54.3 | 3.05 | 3.09 | 3.77 | 2.44 | 3.64 |
InternVL2-8B | 8B | 54.0 | 56.9 | - | - | - | - | - |
InternLM-XComposer-2.5 | 8B | 55.8 | - | - | - | - | - | - |
LLaVA-NeXT-Video | 32B | 60.2 | 63.0 | 3.48 | 3.37 | 3.95 | 2.64 | 3.28 |
MiniCPM-V 2.6 | 8B | 60.9 | 63.6 | 3.59 | 3.28 | 3.93 | 2.73 | 3.62 |
Model | Size | Shot | TextVQA val | VizWiz test-dev | VQAv2 test-dev | OK-VQA val |
---|---|---|---|---|---|---|
Flamingo | 80B | 0* | 35.0 | 31.6 | 56.3 | 40.6 |
4 | 36.5 | 39.6 | 63.1 | 57.4 | ||
8 | 37.3 | 44.8 | 65.6 | 57.5 | ||
IDEFICS | 80B | 0* | 30.9 | 36.0 | 60.0 | 45.2 |
4 | 34.3 | 40.4 | 63.6 | 52.4 | ||
8 | 35.7 | 46.1 | 64.8 | 55.1 | ||
OmniCorpus | 7B | 0* | 43.0 | 49.8 | 63.2 | 45.5 |
4 | 45.4 | 51.3 | 64.5 | 46.5 | ||
8 | 45.6 | 52.2 | 64.7 | 46.6 | ||
Emu2 | 37B | 0 | 26.4 | 40.4 | 33.5 | 26.7 |
4 | 48.2 | 54.6 | 67.0 | 53.2 | ||
8 | 49.3 | 54.7 | 67.8 | 54.1 | ||
MM1 | 30B | 0 | 26.2 | 40.4 | 48.9 | 26.7 |
8 | 49.3 | 54.7 | 70.9 | 54.1 | ||
MiniCPM-V 2.6+ | 8B | 0 | 43.9 | 33.8 | 45.4 | 23.9 |
4 | 63.6 | 60.5 | 65.5 | 50.1 | ||
8 | 64.6 | 63.4 | 68.2 | 51.4 |
Model | Size | OCRBench | TextVQA val | DocVQA test | Open-Compass | MME | MMB test (en) | MMB test (cn) | MMMU val | Math-Vista | LLaVA Bench | RealWorld QA | Object HalBench |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Proprietary | |||||||||||||
Gemini Pro | - | 680 | 74.6 | 88.1 | 62.9 | 2148.9 | 73.6 | 74.3 | 48.9 | 45.8 | 79.9 | 60.4 | - |
GPT-4V (2023.11.06) | - | 645 | 78.0 | 88.4 | 63.5 | 1771.5 | 77.0 | 74.4 | 53.8 | 47.8 | 93.1 | 63.0 | 86.4 |
Open-source | |||||||||||||
Mini-Gemini | 2.2B | - | 56.2 | 34.2* | - | 1653.0 | - | - | 31.7 | - | - | - | - |
Qwen-VL-Chat | 9.6B | 488 | 61.5 | 62.6 | 51.6 | 1860.0 | 61.8 | 56.3 | 37.0 | 33.8 | 67.7 | 49.3 | 56.2 |
DeepSeek-VL-7B | 7.3B | 435 | 64.7* | 47.0* | 54.6 | 1765.4 | 73.8 | 71.4 | 38.3 | 36.8 | 77.8 | 54.2 | - |
Yi-VL-34B | 34B | 290 | 43.4* | 16.9* | 52.2 | 2050.2 | 72.4 | 70.7 | 45.1 | 30.7 | 62.3 | 54.8 | 79.3 |
CogVLM-Chat | 17.4B | 590 | 70.4 | 33.3* | 54.2 | 1736.6 | 65.8 | 55.9 | 37.3 | 34.7 | 73.9 | 60.3 | 73.6 |
TextMonkey | 9.7B | 558 | 64.3 | 66.7 | - | - | - | - | - | - | - | - | - |
Idefics2 | 8.0B | - | 73.0 | 74.0 | 57.2 | 1847.6 | 75.7 | 68.6 | 45.2 | 52.2 | 49.1 | 60.7 | - |
Bunny-LLama-3-8B | 8.4B | - | - | - | 54.3 | 1920.3 | 77.0 | 73.9 | 41.3 | 31.5 | 61.2 | 58.8 | - |
LLaVA-NeXT Llama-3-8B | 8.4B | - | - | - | - | 1971.5 | - | - | 41.7 | - | 80.1 | 60.0 | - |
Phi-3-vision-128k-instruct | 4.2B | 639* | 70.9 | - | - | 1537.5* | - | - | 40.4 | 44.5 | 64.2* | 58.8* | - |
MiniCPM-V 1.0 | 2.8B | 366 | 60.6 | 38.2 | 47.5 | 1650.2 | 64.1 | 62.6 | 38.3 | 28.9 | 51.3 | 51.2 | 78.4 |
MiniCPM-V 2.0 | 2.8B | 605 | 74.1 | 71.9 | 54.5 | 1808.6 | 69.1 | 66.5 | 38.2 | 38.7 | 69.2 | 55.8 | 85.5 |
MiniCPM-Llama3-V 2.5 | 8.5B | 725 | 76.6 | 84.8 | 65.1 | 2024.6 | 77.2 | 74.2 | 45.8 | 54.3 | 86.7 | 63.5 | 89.7 |