NeMo
nvidia
jiaqiz commited on
Commit
91d311f
1 Parent(s): 5402d2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -192,23 +192,23 @@ The training corpus for Nemotron-4-340B-Base consists of English and multilingua
192
 
193
  #### Overview
194
 
195
- *5-shot performance.* Language Understanding evaluated using [Massive Multitask Language Understanding](https://arxiv.org/abs/2009.03300):
196
  | Average |
197
  | :------------- |
198
  | 81.1 |
199
 
200
- *Zero-shot performance.* Evaluated using select datasets from the [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) with additions:
201
  | HellaSwag | Winogrande | BBH| ARC-Challenge |
202
  | :------------- | :------------- | :------------- | :------------- |
203
  | 90.53 | 89.50 | 85.44 | 94.28 |
204
 
205
- *Chain of Thought (CoT)*. Multilingual capabilities evaluated using [Multilingual Grade School Math](https://arxiv.org/abs/2210.03057):
206
 
207
  | ES Exact Match (%) | JA Exact Match (%) | TH Exact Match (%) |
208
  | :------------- | :------------- | :------------- |
209
  | 68.8 | 69.6 | 68.4 |
210
 
211
- *Code generation performance*. Evaluated using [HumanEval](https://github.com/openai/human-eval):
212
  | p@1, 0-Shot |
213
  | :------------- |
214
  | 57.3 |
 
192
 
193
  #### Overview
194
 
195
+ *5-shot performance.* Language Understanding evaluated using Massive Multitask Language Understanding:
196
  | Average |
197
  | :------------- |
198
  | 81.1 |
199
 
200
+ *Zero-shot performance.* Evaluated using select datasets from the LM Evaluation Harness with additions:
201
  | HellaSwag | Winogrande | BBH| ARC-Challenge |
202
  | :------------- | :------------- | :------------- | :------------- |
203
  | 90.53 | 89.50 | 85.44 | 94.28 |
204
 
205
+ *Chain of Thought (CoT)*. Multilingual capabilities evaluated using Multilingual Grade School Math:
206
 
207
  | ES Exact Match (%) | JA Exact Match (%) | TH Exact Match (%) |
208
  | :------------- | :------------- | :------------- |
209
  | 68.8 | 69.6 | 68.4 |
210
 
211
+ *Code generation performance*. Evaluated using HumanEval:
212
  | p@1, 0-Shot |
213
  | :------------- |
214
  | 57.3 |