Intel
/

neural-chat-7b-v3-1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lvkaokao commited on Nov 14, 2023

Commit

fd81216

•

1 Parent(s): 14f9323

update metric from llm leaderboard.

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -11,13 +11,13 @@ Neural-chat-7b-v3 was trained between September and October, 2023.
 ## Evaluation
-We use the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master) to measure the metrics that are adopted by [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
-| Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ |
-| --- | --- | --- | --- | --- | --- |
-|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 62.4 | 59.58  | 83.31  | 64.16  | 42.15 |
-| **Ours** | **67.92** | 66.29 | 83.28 | 62.11  | 60.02 |
 ## Training procedure

 ## Evaluation
+We submit our model to [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), and the model performance has been **improved significantly** as we see from the average metric of 7 tasks from the leaderboard.
+| Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Winogrande (5-s) | GSM8K (5-s) | DROP (3-s) |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 50.32 | 59.58  | 83.31  | 64.16  | 42.15 | 78.37 | 18.12 | 6.14 |
+| [Intel/neural-chat-7b-v3](https://huggingface.co/Intel/neural-chat-7b-v3) | **57.31** | 67.15 | 83.29 | 62.26  | 58.77 | 78.06 | 1.21 | 50.43 |
+| [Intel/neural-chat-7b-v3](https://huggingface.co/Intel/neural-chat-7b-v3) | **59.06** | 66.21 | 83.64 | 62.37  | 59.65 | 78.14 | 19.56 | 43.84 |
 ## Training procedure