DavidGF commited on
Commit
0a243a0
1 Parent(s): 87cf835

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -79,7 +79,7 @@ Detailed information on how the new training strategy works and the advantages i
79
 
80
 
81
  ### Prompt Template:
82
- We trained on vicuna prompt template. Please add the following stopping string to your client: </s>,</p> (we did not add the special tokens to the training config)
83
  ```
84
  You are a helpful AI Assistant.
85
 
@@ -91,17 +91,17 @@ ASSISTANT:
91
  ## Evaluation
92
 
93
  **Open LLM Leaderboard:**
94
- * benchmarks were done with the newest version of lm-evaluation-harness on a batch-size of 1:
95
 
96
  | Metric | Value |
97
  |-----------------------|---------------------------|
98
- | Avg. | **68.92** |
99
  | ARC (25-shot) | 59.98 |
100
- | HellaSwag (10-shot) | 82.28 |
101
- | MMLU (5-shot) | 63.53|
102
- | TruthfulQA (0-shot) | 61.2 |
103
- | Winogrande (5-shot) | 80.27 |
104
- | GSM8K (5-shot) | 66.26 |
105
 
106
  Dispite the fact that we archived great results on the Open LLM leaderboard benchmarks the model subjectively does not feel as smart as comparable mistral finetunes. Most of its answers are coherent but we observed that the model sometimes answers realy lazy or odd.
107
 
 
79
 
80
 
81
  ### Prompt Template:
82
+ We trained on vicuna prompt template. Please add the following stopping string to your client: '</s>','</p>'' (we did not add the special tokens to the training config)
83
  ```
84
  You are a helpful AI Assistant.
85
 
 
91
  ## Evaluation
92
 
93
  **Open LLM Leaderboard:**
94
+
95
 
96
  | Metric | Value |
97
  |-----------------------|---------------------------|
98
+ | Avg. | **67.83** |
99
  | ARC (25-shot) | 59.98 |
100
+ | HellaSwag (10-shot) | 81.91 |
101
+ | MMLU (5-shot) | 63.76|
102
+ | TruthfulQA (0-shot) | 61 |
103
+ | Winogrande (5-shot) | 76.64 |
104
+ | GSM8K (5-shot) | 63.68 |
105
 
106
  Dispite the fact that we archived great results on the Open LLM leaderboard benchmarks the model subjectively does not feel as smart as comparable mistral finetunes. Most of its answers are coherent but we observed that the model sometimes answers realy lazy or odd.
107