lvkaokao commited on
Commit
fd81216
1 Parent(s): 14f9323

update metric from llm leaderboard.

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -11,13 +11,13 @@ Neural-chat-7b-v3 was trained between September and October, 2023.
11
 
12
  ## Evaluation
13
 
14
- We use the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master) to measure the metrics that are adopted by [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
15
-
16
- | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ |
17
- | --- | --- | --- | --- | --- | --- |
18
- |[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 62.4 | 59.58 | 83.31 | 64.16 | 42.15 |
19
- | **Ours** | **67.92** | 66.29 | 83.28 | 62.11 | 60.02 |
20
 
 
 
 
 
 
21
 
22
  ## Training procedure
23
 
 
11
 
12
  ## Evaluation
13
 
14
+ We submit our model to [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), and the model performance has been **improved significantly** as we see from the average metric of 7 tasks from the leaderboard.
 
 
 
 
 
15
 
16
+ | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Winogrande (5-s) | GSM8K (5-s) | DROP (3-s) |
17
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
18
+ |[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 50.32 | 59.58 | 83.31 | 64.16 | 42.15 | 78.37 | 18.12 | 6.14 |
19
+ | [Intel/neural-chat-7b-v3](https://huggingface.co/Intel/neural-chat-7b-v3) | **57.31** | 67.15 | 83.29 | 62.26 | 58.77 | 78.06 | 1.21 | 50.43 |
20
+ | [Intel/neural-chat-7b-v3](https://huggingface.co/Intel/neural-chat-7b-v3) | **59.06** | 66.21 | 83.64 | 62.37 | 59.65 | 78.14 | 19.56 | 43.84 |
21
 
22
  ## Training procedure
23