Text Generation
Transformers
Safetensors
llama
text-generation-inference
TildeSIA commited on
Commit
d91ef50
·
verified ·
1 Parent(s): 79af113

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -2
README.md CHANGED
@@ -112,7 +112,7 @@ Results
112
 
113
  **What did we do?** We used the standard implementation of the [belebele](https://github.com/eleutherai/lm-evaluation-harness/tree/main/lm_eval/tasks/belebele) task from the LLM Evaluation Harness. We set tokenisers to ```use_fast=False```. We report **5-shot** accuracy.
114
 
115
- | 5-shot | Gemma 2 27b | ALIA 40b | EuroLLM Prev. 22b | TildeOpen 1.1 30b |
116
  |----------|:-------------:|:----------:|:------------:|:-------------------:|
117
  | Bulgarian | 79.8% | 78.8% | **85.3%** | 84.7% |
118
  | Czech | 81.4% | 78.3% | 85.3% | **85.8%** |
@@ -148,7 +148,7 @@ Results
148
  **What did we do?**
149
  We used the standard implementation of the [MultiBLiMP](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/multiblimp) task from the LLM Evaluation Harness. We set tokenisers to ```use_fast=False```. We report **0-shot** accuracy.
150
 
151
- | Language | Gemma 2 27b | ALIA 40b | EuroLLM Prev. 22b | TildeOpen 1.1 30b
152
  |----------|-------------|----------|---------------------|-------------|
153
  | Bulgarian | 95.4% | 98.8% | 97.7% | **99.6%** |
154
  | Czech | 98.6% | **98.9%** | 98.5% | 98.5% |
@@ -175,3 +175,73 @@ We used the standard implementation of the [MultiBLiMP](https://github.com/Eleut
175
  | Turkish | 97.6% | **98.7%** | 97.9% | 96.4% |
176
  | Ukrainian | 95.6% | 98.0% | 97.3% | **99.2%** |
177
  | **Average** | 95.7% | 96.7% | 96.4% | **99.0%** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
  **What did we do?** We used the standard implementation of the [belebele](https://github.com/eleutherai/lm-evaluation-harness/tree/main/lm_eval/tasks/belebele) task from the LLM Evaluation Harness. We set tokenisers to ```use_fast=False```. We report **5-shot** accuracy.
114
 
115
+ | 5-shot | **Gemma 2 27b** | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** |
116
  |----------|:-------------:|:----------:|:------------:|:-------------------:|
117
  | Bulgarian | 79.8% | 78.8% | **85.3%** | 84.7% |
118
  | Czech | 81.4% | 78.3% | 85.3% | **85.8%** |
 
148
  **What did we do?**
149
  We used the standard implementation of the [MultiBLiMP](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/multiblimp) task from the LLM Evaluation Harness. We set tokenisers to ```use_fast=False```. We report **0-shot** accuracy.
150
 
151
+ | Language | **Gemma 2 27b** | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b**
152
  |----------|-------------|----------|---------------------|-------------|
153
  | Bulgarian | 95.4% | 98.8% | 97.7% | **99.6%** |
154
  | Czech | 98.6% | **98.9%** | 98.5% | 98.5% |
 
175
  | Turkish | 97.6% | **98.7%** | 97.9% | 96.4% |
176
  | Ukrainian | 95.6% | 98.0% | 97.3% | **99.2%** |
177
  | **Average** | 95.7% | 96.7% | 96.4% | **99.0%** |
178
+
179
+ ## Knowledge tests
180
+
181
+ ### ARC Benchmark Results
182
+
183
+ | 5-shot | | **ARC Easy**| | | **ARC Hard**| |
184
+ |----------|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
185
+ | **Language** | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** |
186
+ | Danish | 79.9% | **80.1%** | 79.6% | 53.4% | 52.6% | **53.7%** |
187
+ | German | 79.6% | **79.9%** | 78.0% | 53.4% | **53.6%** | 51.7% |
188
+ | Spanish | **82.9%** | 81.7% | 79.4% | **57.3%** | 56.1% | 52.4% |
189
+ | French | **81.7%** | 81.1% | 78.6% | **56.0%** | 54.5% | 52.8% |
190
+ | Italian | 80.5% | **81.6%** | 78.5% | **56.4%** | 54.8% | 54.1% |
191
+ | Dutch | **80.1%** | 80.0% | 78.8% | **54.0%** | 53.8% | 52.2% |
192
+ | Portuguese | **81.7%** | 81.1% | 79.0% | **56.9%** | 55.5% | 54.1% |
193
+ | Swedish | 80.3% | **80.5%** | 78.7% | 53.8% | 53.1% | **54.1%** |
194
+ | **AVG WEST** | **80.8%** | **80.8%** | 78.8% | **55.2%** | 54.2% | 53.1% |
195
+ | | | | | | | |
196
+ | Bulgarian | **79.8%** | 79.2% | 79.5% | **53.8%** | 51.8% | 52.8% |
197
+ | Czech | **79.5%** | **79.5%** | 78.8% | 51.5% | 52.3% | **53.9%** |
198
+ | Estonian | 72.4% | 73.0% | **73.1%** | 49.6% | 49.8% | **52.0%** |
199
+ | Finnish | 73.8% | **74.2%** | 73.3% | 48.7% | 51.1% | **52.1%** |
200
+ | Hungarian | 74.0% | 73.9% | **74.9%** | 49.3% | 49.0% | **49.6%** |
201
+ | Lithuanian | 76.4% | 76.1% | **77.9%** | 50.3% | 51.6% | **53.0%** |
202
+ | Latvian | 76.2% | **76.4%** | 75.9% | 50.7% | 49.8% | **50.9%** |
203
+ | Polish | **79.2%** | 78.2% | 78.0% | **54.5%** | 53.3% | 52.7% |
204
+ | Romanian | **79.6%** | 78.8% | 78.8% | **55.5%** | 53.7% | 54.5% |
205
+ | Slovak | 78.8% | 79.2% | **79.6%** | 52.5% | 53.0% | **54.7%** |
206
+ | Slovenian | **78.3%** | 78.5% | **78.3%** | **53.4%** | 52.2% | 52.7% |
207
+ | **AVG EAST** | **77.1%** | 77.0% | **77.1%** | 51.8% | 51.6% | **52.6%** |
208
+
209
+ ### MMLU Benchmark Results
210
+
211
+ | 0-shot | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** |
212
+ |----------|:-----------------:|:---------------------:|:-------------------:|
213
+ | Bulgarian | 48.3% | 52.0% | **56.3%** |
214
+ | Czech | 49.1% | 51.7% | **56.4%** |
215
+ | Danish | 50.2% | 51.1% | **56.6%** |
216
+ | German | 51.0% | 51.8% | **56.2%** |
217
+ | Greek | 50.7% | 50.6% | **50.9%** |
218
+ | Spanish | 53.3% | 53.4% | **56.3%** |
219
+ | Estonian | 48.7% | 49.2% | **55.3%** |
220
+ | Finnish | 47.4% | 48.9% | **55.4%** |
221
+ | French | 53.1% | 53.8% | **56.4%** |
222
+ | Hungarian | 49.9% | 44.4% | **55.2%** |
223
+ | Italian | 52.3% | 53.7% | **57.2%** |
224
+ | Lithuanian | 47.3% | 49.4% | **54.7%** |
225
+ | Latvian | 46.9% | 48.0% | **54.0%** |
226
+ | Dutch | 50.8% | 53.0% | **56.5%** |
227
+ | Polish | 50.6% | 49.6% | **55.6%** |
228
+ | Portuguese | 52.4% | 53.7% | **56.4%** |
229
+ | Romanian | 51.0% | 52.1% | **56.2%** |
230
+ | Slovak | 49.0% | 52.2% | **56.3%** |
231
+ | Slovenian | 48.2% | 50.7% | **55.3%** |
232
+ | Swedish | 49.6% | 51.2% | **56.1%** |
233
+ | **Average** | 50.0% | 51.0% | **55.7%** |
234
+
235
+ ### National Exams Results
236
+ | 5-shot | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** |
237
+ |----------|----------|-------------------|-------------------|
238
+ | Bulgarian | 62.4% | 66.8% | **67.8%** |
239
+ | Croatian | 70.8% | **72.5%** | 71.9% |
240
+ | Hungarian | 48.9% | **51.9%** | 48.9% |
241
+ | Italian | **65.5%** | 64.6% | 65.0% |
242
+ | Macedonian | 74.2% | 72.0% | **80.2%** |
243
+ | Polish | 61.2% | 61.4% | **63.5%** |
244
+ | Portuguese | **61.4%** | 60.9% | 59.2% |
245
+ | Albanian | 55.6% | 55.0% | **75.6%** |
246
+ | Serbian | 64.7% | 57.3% | **66.9%** |
247
+ | **Average** | 62.7% | 62.5% | **66.6%** |