boxinw@nvidia.com commited on
Commit
732e074
•
1 Parent(s): 9cf8c9e

Add text only results

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -34,6 +34,8 @@ We provide the results from both the Huggingface codebase and the Megatron codeb
34
 
35
  Results (as of September 17th, 2024) in the multimodal benchmarks are as follows:
36
 
 
 
37
  | Benchmark | MMMU (val / test) | MathVista | OCRBench | AI2D | ChartQA | DocVQA | TextVQA | RealWorldQA | VQAv2 |
38
  |------------------------------|-------------------|-----------|----------|------|---------|--------|---------|-------------|-------|
39
  | NVLM-D 1.0 72B (Huggingface) | 58.7 / 54.9 | 65.2 | 852 | 94.2 | 86.0 | 92.6 | 82.6 | 69.5 | 85.4 |
@@ -47,6 +49,28 @@ Results (as of September 17th, 2024) in the multimodal benchmarks are as follows
47
  | Claude 3.5 Sonnet | 68.3 / - | 67.7 | 788 | 94.7 | 90.8 | 95.2 | - | - | - |
48
  | Gemini 1.5 Pro (Aug 2024) | 62.2 / - | 63.9 | 754 | 94.4 | 87.2 | 93.1 | 78.7 | 70.4 | 80.2 |
49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
 
52
  ## How to use
 
34
 
35
  Results (as of September 17th, 2024) in the multimodal benchmarks are as follows:
36
 
37
+ ### Vision-language Benchmarks
38
+
39
  | Benchmark | MMMU (val / test) | MathVista | OCRBench | AI2D | ChartQA | DocVQA | TextVQA | RealWorldQA | VQAv2 |
40
  |------------------------------|-------------------|-----------|----------|------|---------|--------|---------|-------------|-------|
41
  | NVLM-D 1.0 72B (Huggingface) | 58.7 / 54.9 | 65.2 | 852 | 94.2 | 86.0 | 92.6 | 82.6 | 69.5 | 85.4 |
 
49
  | Claude 3.5 Sonnet | 68.3 / - | 67.7 | 788 | 94.7 | 90.8 | 95.2 | - | - | - |
50
  | Gemini 1.5 Pro (Aug 2024) | 62.2 / - | 63.9 | 754 | 94.4 | 87.2 | 93.1 | 78.7 | 70.4 | 80.2 |
51
 
52
+ ### Text-only Benchmarks
53
+
54
+ | Tasks | Backbone LLM | MMLU | GSM8K | MATH | HumanEval | Avg. Accuracy |
55
+ |------------------------------|--------------|------|-------|------|-----------|------------------|
56
+ | **Proprietary** | | | | | | |
57
+ | GPT-4.0 | N/A | 88.7 | - | 76.6 | 90.2 | - |
58
+ | Gemini Pro 1.5 (Aug 2024) | N/A | 85.9 | 90.8 | 67.7 | 84.1 | 82.1 |
59
+ | Claude 3.5 Sonnet | N/A | 88.7 | 96.4 | 71.1 | 92.0 | 87.0 |
60
+ | **Open LLM** | | | | | | |
61
+ | (a) Nous-Hermes-2-Yi-34B | N/A | 75.5 | 78.6 | 21.8 | 43.3 | 54.8 |
62
+ | (b) Qwen-72B-Instruct | N/A | 82.3 | 91.1 | 59.7 | 86.0 | 79.8 |
63
+ | (c) Llama-3-70B-Instruct | N/A | 82.0 | 93.0 | 51.0 | 81.7 | 76.6 |
64
+ | (d) Llama-3.1-70B-Instruct | N/A | 83.6 | 95.1 | 68.0 | 80.5 | 81.8 |
65
+ | (e) Llama-3.1-405B-Instruct | N/A | 87.3 | 96.8 | 73.8 | 89.0 | 86.7 |
66
+ | **Open Multimodal LLM** | | | | | | |
67
+ | VILA-1.5 40B | (a) | 73.3 | 67.5 | 16.8 | 34.1 | 🥶 47.9 (-6.9) |
68
+ | LLaVA-OneVision 72B | (b) | 80.6 | 89.9 | 49.2 | 74.4 | 🥶 73.5 (-6.3) |
69
+ | InternVL-2-Llama3-76B | (c) | 78.5 | 87.1 | 42.5 | 71.3 | 🥶 69.9 (-6.7) |
70
+ | *Llama 3-V 70B | (d) | 83.6 | 95.1 | 68.0 | 80.5 | 🙂 81.8 (0) |
71
+ | *Llama 3-V 405B | (e) | 87.3 | 96.8 | 73.8 | 89.0 | 🙂 86.7 (0) |
72
+ | NVLM-D 1.0 72B (Megatron) | (b) | 82.0 | 92.9 | 73.1 | 88.4 | 🥳 84.1 (+4.3) |
73
+ | NVLM-D 1.0 72B (Huggingface) | (b) | 81.7 | 93.2 | 73.1 | 89.0 | 🥳 84.3 (+4.5) |
74
 
75
 
76
  ## How to use