bleysg commited on
Commit
cdff12f
1 Parent(s): 4d93865

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -18,7 +18,7 @@ This model doesn't dramatically improve on the base model's general task perform
18
 
19
  # Evaluations
20
 
21
- We've only done very limited testing as yet. The [epoch 4.5 checkpoint](https://huggingface.co/Open-Orca/oo-phi-1_5/commit/aa05eb2596d6d11951695d2e327616188d768880) scores above 5 on MT-Bench (better than Alpaca-13B, worse than Llama2-7b-chat), while preliminary benchmarks suggest peak average performance was achieved roughly at epoch 4.
22
 
23
  ## HuggingFaceH4 Open LLM Leaderboard Performance
24
 
@@ -29,6 +29,9 @@ The only significant improvement was with TruthfulQA.
29
 
30
  ## MT-bench Performance
31
 
 
 
 
32
  | Epoch | Average | Turn 1 | Turn 2 |
33
  |:----------|:----------|:----------|:----------|
34
  | 3 | 4.85 | 5.69 | 4.01 |
 
18
 
19
  # Evaluations
20
 
21
+ We've only done limited testing as yet. The [epoch 3.5 checkpoint](https://huggingface.co/Open-Orca/oo-phi-1_5/commit/f7754d8b8b4c3e0748eaf47be4cf5aac1f80a401) scores above 5.1 on MT-Bench (better than Alpaca-13B, worse than Llama2-7b-chat), while preliminary benchmarks suggest peak average performance was achieved roughly at epoch 4.
22
 
23
  ## HuggingFaceH4 Open LLM Leaderboard Performance
24
 
 
29
 
30
  ## MT-bench Performance
31
 
32
+
33
+ ![MT-bench Score](https://huggingface.co/Open-Orca/oo-phi-1_5/resolve/main/Images/oo-phi-1_5-mtbench.png)
34
+
35
  | Epoch | Average | Turn 1 | Turn 2 |
36
  |:----------|:----------|:----------|:----------|
37
  | 3 | 4.85 | 5.69 | 4.01 |