nguyenbh commited on
Commit
ca22eff
1 Parent(s): bbf1ecf

Update README

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -195,7 +195,7 @@ More specifically, we do not change prompts, pick different few-shot examples, c
195
 
196
  The number of k–shot examples is listed per-benchmark.
197
 
198
- |Benchmark|Phi-3-Medium-4K-Instruct<br>14b|Command R+<br>104B|Mixtral<br>8x22B|Llama-3-70B-Instruct<br>8b|GPT3.5-Turbo<br>version 1106|Gemini<br>Pro|GPT-4-Turbo<br>version 1106 (Chat)|
199
  |---------|-----------------------|--------|-------------|-------------------|-------------------|----------|------------------------|
200
  |AGI Eval<br>5-shot|50.2|50.1|54.0|56.9|48.4|49.0|59.6|
201
  |MMLU<br>5-shot|78.0|73.8|76.2|80.2|71.4|66.7|84.0|
@@ -220,7 +220,7 @@ The number of k–shot examples is listed per-benchmark.
220
 
221
  We take a closer look at different categories across 80 public benchmark datasets at the table below:
222
 
223
- |Benchmark|Phi-3-Medium-4K-Instruct<br>14b|Command R+<br>104B|Mixtral<br>8x22B|Llama-3-70B-Instruct<br>8b|GPT3.5-Turbo<br>version 1106|Gemini<br>Pro|GPT-4-Turbo<br>version 1106 (Chat)|
224
  |--------|------------------------|--------|-------------|-------------------|-------------------|----------|------------------------|
225
  |Popular aggregated benchmark|75.4|69.9|73.4|76.3|67.0|67.5|80.5|
226
  |Reasoning|84.1|79.3|81.5|86.7|78.3|80.4|89.3|
 
195
 
196
  The number of k–shot examples is listed per-benchmark.
197
 
198
+ |Benchmark|Phi-3-Medium-4K-Instruct<br>14b|Command R+<br>104B|Mixtral<br>8x22B|Llama-3-70B-Instruct|GPT3.5-Turbo<br>version 1106|Gemini<br>Pro|GPT-4-Turbo<br>version 1106 (Chat)|
199
  |---------|-----------------------|--------|-------------|-------------------|-------------------|----------|------------------------|
200
  |AGI Eval<br>5-shot|50.2|50.1|54.0|56.9|48.4|49.0|59.6|
201
  |MMLU<br>5-shot|78.0|73.8|76.2|80.2|71.4|66.7|84.0|
 
220
 
221
  We take a closer look at different categories across 80 public benchmark datasets at the table below:
222
 
223
+ |Benchmark|Phi-3-Medium-4K-Instruct<br>14b|Command R+<br>104B|Mixtral<br>8x22B|Llama-3-70B-Instruct|GPT3.5-Turbo<br>version 1106|Gemini<br>Pro|GPT-4-Turbo<br>version 1106 (Chat)|
224
  |--------|------------------------|--------|-------------|-------------------|-------------------|----------|------------------------|
225
  |Popular aggregated benchmark|75.4|69.9|73.4|76.3|67.0|67.5|80.5|
226
  |Reasoning|84.1|79.3|81.5|86.7|78.3|80.4|89.3|