PyTorch
Romanian
llama
Eval Results
mihaimasala commited on
Commit
6de56f0
·
verified ·
1 Parent(s): 9923343

Upload 9 files

Browse files
README.md CHANGED
@@ -1,3 +1,636 @@
1
- ---
2
- license: llama2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ language:
4
+ - ro
5
+ base_model: meta-llama/Llama-2-7b-hf
6
+ model-index:
7
+ - name: OpenLLM-Ro/RoLlama2-7b-Base-2024-05-14
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ dataset:
12
+ name: Romanian_Academic_Benchmarks
13
+ type: Romanian_Academic_Benchmarks
14
+ metrics:
15
+ - name: Average accuracy
16
+ type: accuracy
17
+ value: 38.03
18
+ - task:
19
+ type: text-generation
20
+ dataset:
21
+ name: OpenLLM-Ro/ro_arc_challenge
22
+ type: OpenLLM-Ro/ro_arc_challenge
23
+ metrics:
24
+ - name: Average accuracy
25
+ type: accuracy
26
+ value: 37.95
27
+ - task:
28
+ type: text-generation
29
+ dataset:
30
+ name: OpenLLM-Ro/ro_mmlu
31
+ type: OpenLLM-Ro/ro_mmlu
32
+ metrics:
33
+ - name: Average accuracy
34
+ type: accuracy
35
+ value: 27.22
36
+ - task:
37
+ type: text-generation
38
+ dataset:
39
+ name: OpenLLM-Ro/ro_winogrande
40
+ type: OpenLLM-Ro/ro_winogrande
41
+ metrics:
42
+ - name: Average accuracy
43
+ type: accuracy
44
+ value: 59.29
45
+ - task:
46
+ type: text-generation
47
+ dataset:
48
+ name: OpenLLM-Ro/ro_hellaswag
49
+ type: OpenLLM-Ro/ro_hellaswag
50
+ metrics:
51
+ - name: Average accuracy
52
+ type: accuracy
53
+ value: 57.22
54
+ - task:
55
+ type: text-generation
56
+ dataset:
57
+ name: OpenLLM-Ro/ro_gsm8k
58
+ type: OpenLLM-Ro/ro_gsm8k
59
+ metrics:
60
+ - name: Average accuracy
61
+ type: accuracy
62
+ value: 2.53
63
+ - task:
64
+ type: text-generation
65
+ dataset:
66
+ name: OpenLLM-Ro/ro_truthfulqa
67
+ type: OpenLLM-Ro/ro_truthfulqa
68
+ metrics:
69
+ - name: Average accuracy
70
+ type: accuracy
71
+ value: 44
72
+ - task:
73
+ type: text-generation
74
+ dataset:
75
+ name: LaRoSeDa_binary
76
+ type: LaRoSeDa_binary
77
+ metrics:
78
+ - name: Average macro-f1
79
+ type: macro-f1
80
+ value: 83.25
81
+ - task:
82
+ type: text-generation
83
+ dataset:
84
+ name: LaRoSeDa_multiclass
85
+ type: LaRoSeDa_multiclass
86
+ metrics:
87
+ - name: Average macro-f1
88
+ type: macro-f1
89
+ value: 61.04
90
+ - task:
91
+ type: text-generation
92
+ dataset:
93
+ name: LaRoSeDa_binary_finetuned
94
+ type: LaRoSeDa_binary_finetuned
95
+ metrics:
96
+ - name: Average macro-f1
97
+ type: macro-f1
98
+ value: 98.97
99
+ - task:
100
+ type: text-generation
101
+ dataset:
102
+ name: LaRoSeDa_multiclass_finetuned
103
+ type: LaRoSeDa_multiclass_finetuned
104
+ metrics:
105
+ - name: Average macro-f1
106
+ type: macro-f1
107
+ value: 87.72
108
+ - task:
109
+ type: text-generation
110
+ dataset:
111
+ name: WMT_EN-RO
112
+ type: WMT_EN-RO
113
+ metrics:
114
+ - name: Average bleu
115
+ type: bleu
116
+ value: 10.01
117
+ - task:
118
+ type: text-generation
119
+ dataset:
120
+ name: WMT_RO-EN
121
+ type: WMT_RO-EN
122
+ metrics:
123
+ - name: Average bleu
124
+ type: bleu
125
+ value: 13.03
126
+ - task:
127
+ type: text-generation
128
+ dataset:
129
+ name: WMT_EN-RO_finetuned
130
+ type: WMT_EN-RO_finetuned
131
+ metrics:
132
+ - name: Average bleu
133
+ type: bleu
134
+ value: 27.85
135
+ - task:
136
+ type: text-generation
137
+ dataset:
138
+ name: WMT_RO-EN_finetuned
139
+ type: WMT_RO-EN_finetuned
140
+ metrics:
141
+ - name: Average bleu
142
+ type: bleu
143
+ value: 39.3
144
+ - task:
145
+ type: text-generation
146
+ dataset:
147
+ name: XQuAD
148
+ type: XQuAD
149
+ metrics:
150
+ - name: Average exact_match
151
+ type: exact_match
152
+ value: 30.15
153
+ - task:
154
+ type: text-generation
155
+ dataset:
156
+ name: XQuAD
157
+ type: XQuAD
158
+ metrics:
159
+ - name: Average f1
160
+ type: f1
161
+ value: 47.03
162
+ - task:
163
+ type: text-generation
164
+ dataset:
165
+ name: XQuAD_finetuned
166
+ type: XQuAD_finetuned
167
+ metrics:
168
+ - name: Average exact_match
169
+ type: exact_match
170
+ value: 67.06
171
+ - task:
172
+ type: text-generation
173
+ dataset:
174
+ name: XQuAD_finetuned
175
+ type: XQuAD_finetuned
176
+ metrics:
177
+ - name: Average f1
178
+ type: f1
179
+ value: 79.96
180
+ - task:
181
+ type: text-generation
182
+ dataset:
183
+ name: STS
184
+ type: STS
185
+ metrics:
186
+ - name: Average spearman
187
+ type: spearman
188
+ value: 7.89
189
+ - task:
190
+ type: text-generation
191
+ dataset:
192
+ name: STS
193
+ type: STS
194
+ metrics:
195
+ - name: Average pearson
196
+ type: pearson
197
+ value: 7.98
198
+ - task:
199
+ type: text-generation
200
+ dataset:
201
+ name: STS_finetuned
202
+ type: STS_finetuned
203
+ metrics:
204
+ - name: Average spearman
205
+ type: spearman
206
+ value: 71.75
207
+ - task:
208
+ type: text-generation
209
+ dataset:
210
+ name: STS_finetuned
211
+ type: STS_finetuned
212
+ metrics:
213
+ - name: Average pearson
214
+ type: pearson
215
+ value: 71.99
216
+ - task:
217
+ type: text-generation
218
+ dataset:
219
+ name: OpenLLM-Ro/ro_arc_challenge
220
+ type: OpenLLM-Ro/ro_arc_challenge
221
+ metrics:
222
+ - name: 0-shot
223
+ type: accuracy
224
+ value: 35.56
225
+ - name: 1-shot
226
+ type: accuracy
227
+ value: 36.42
228
+ - name: 3-shot
229
+ type: accuracy
230
+ value: 38.56
231
+ - name: 5-shot
232
+ type: accuracy
233
+ value: 38.39
234
+ - name: 10-shot
235
+ type: accuracy
236
+ value: 39.07
237
+ - name: 25-shot
238
+ type: accuracy
239
+ value: 39.67
240
+ - task:
241
+ type: text-generation
242
+ dataset:
243
+ name: OpenLLM-Ro/ro_mmlu
244
+ type: OpenLLM-Ro/ro_mmlu
245
+ metrics:
246
+ - name: 0-shot
247
+ type: accuracy
248
+ value: 25.82
249
+ - name: 1-shot
250
+ type: accuracy
251
+ value: 25.48
252
+ - name: 3-shot
253
+ type: accuracy
254
+ value: 27.61
255
+ - name: 5-shot
256
+ type: accuracy
257
+ value: 29.96
258
+ - task:
259
+ type: text-generation
260
+ dataset:
261
+ name: OpenLLM-Ro/ro_winogrande
262
+ type: OpenLLM-Ro/ro_winogrande
263
+ metrics:
264
+ - name: 0-shot
265
+ type: accuracy
266
+ value: 58.72
267
+ - name: 1-shot
268
+ type: accuracy
269
+ value: 58.88
270
+ - name: 3-shot
271
+ type: accuracy
272
+ value: 60.38
273
+ - name: 5-shot
274
+ type: accuracy
275
+ value: 59.19
276
+ - task:
277
+ type: text-generation
278
+ dataset:
279
+ name: OpenLLM-Ro/ro_hellaswag
280
+ type: OpenLLM-Ro/ro_hellaswag
281
+ metrics:
282
+ - name: 0-shot
283
+ type: accuracy
284
+ value: 55.85
285
+ - name: 1-shot
286
+ type: accuracy
287
+ value: 57.06
288
+ - name: 3-shot
289
+ type: accuracy
290
+ value: 57.52
291
+ - name: 5-shot
292
+ type: accuracy
293
+ value: 57.89
294
+ - name: 10-shot
295
+ type: accuracy
296
+ value: 57.79
297
+ - task:
298
+ type: text-generation
299
+ dataset:
300
+ name: OpenLLM-Ro/ro_gsm8k
301
+ type: OpenLLM-Ro/ro_gsm8k
302
+ metrics:
303
+ - name: 0-shot
304
+ type: accuracy
305
+ value: 0
306
+ - name: 1-shot
307
+ type: accuracy
308
+ value: 2.96
309
+ - name: 3-shot
310
+ type: accuracy
311
+ value: 4.62
312
+ - task:
313
+ type: text-generation
314
+ dataset:
315
+ name: LaRoSeDa_binary
316
+ type: LaRoSeDa_binary
317
+ metrics:
318
+ - name: 0-shot
319
+ type: macro-f1
320
+ value: 42.78
321
+ - name: 1-shot
322
+ type: macro-f1
323
+ value: 98
324
+ - name: 3-shot
325
+ type: macro-f1
326
+ value: 95.13
327
+ - name: 5-shot
328
+ type: macro-f1
329
+ value: 97.07
330
+ - task:
331
+ type: text-generation
332
+ dataset:
333
+ name: LaRoSeDa_multiclass
334
+ type: LaRoSeDa_multiclass
335
+ metrics:
336
+ - name: 0-shot
337
+ type: macro-f1
338
+ value: 46.41
339
+ - name: 1-shot
340
+ type: macro-f1
341
+ value: 67.36
342
+ - name: 3-shot
343
+ type: macro-f1
344
+ value: 65.16
345
+ - name: 5-shot
346
+ type: macro-f1
347
+ value: 65.23
348
+ - task:
349
+ type: text-generation
350
+ dataset:
351
+ name: WMT_EN-RO
352
+ type: WMT_EN-RO
353
+ metrics:
354
+ - name: 0-shot
355
+ type: bleu
356
+ value: 4.45
357
+ - name: 1-shot
358
+ type: bleu
359
+ value: 8.61
360
+ - name: 3-shot
361
+ type: bleu
362
+ value: 12.25
363
+ - name: 5-shot
364
+ type: bleu
365
+ value: 14.73
366
+ - task:
367
+ type: text-generation
368
+ dataset:
369
+ name: WMT_RO-EN
370
+ type: WMT_RO-EN
371
+ metrics:
372
+ - name: 0-shot
373
+ type: bleu
374
+ value: 1.29
375
+ - name: 1-shot
376
+ type: bleu
377
+ value: 10.78
378
+ - name: 3-shot
379
+ type: bleu
380
+ value: 16.82
381
+ - name: 5-shot
382
+ type: bleu
383
+ value: 23.24
384
+ - task:
385
+ type: text-generation
386
+ dataset:
387
+ name: XQuAD_EM
388
+ type: XQuAD_EM
389
+ metrics:
390
+ - name: 0-shot
391
+ type: exact_match
392
+ value: 5.29
393
+ - name: 1-shot
394
+ type: exact_match
395
+ value: 33.95
396
+ - name: 3-shot
397
+ type: exact_match
398
+ value: 39.24
399
+ - name: 5-shot
400
+ type: exact_match
401
+ value: 42.1
402
+ - task:
403
+ type: text-generation
404
+ dataset:
405
+ name: XQuAD_F1
406
+ type: XQuAD_F1
407
+ metrics:
408
+ - name: 0-shot
409
+ type: f1
410
+ value: 16.17
411
+ - name: 1-shot
412
+ type: f1
413
+ value: 51.84
414
+ - name: 3-shot
415
+ type: f1
416
+ value: 58.82
417
+ - name: 5-shot
418
+ type: f1
419
+ value: 61.29
420
+ - task:
421
+ type: text-generation
422
+ dataset:
423
+ name: STS
424
+ type: STS
425
+ metrics:
426
+ - name: 0-shot
427
+ type: spearman
428
+ value: -1.74
429
+ - name: 1-shot
430
+ type: spearman
431
+ value: 15.47
432
+ - name: 3-shot
433
+ type: spearman
434
+ value: 9.93
435
+ - task:
436
+ type: text-generation
437
+ dataset:
438
+ name: STS
439
+ type: STS
440
+ metrics:
441
+ - name: 0-shot
442
+ type: pearson
443
+ value: -1.4
444
+ - name: 1-shot
445
+ type: pearson
446
+ value: 15
447
+ - name: 3-shot
448
+ type: pearson
449
+ value: 10.33
450
+ datasets:
451
+ - uonlp/CulturaX
452
+ ---
453
+
454
+ # Model Card for Model ID
455
+
456
+ <!-- Provide a quick summary of what the model is/does. -->
457
+
458
+ RoLlama2 is a family of pretrained and fine-tuned generative text models for Romanian. This is the repository for the **foundational 7B model**. Links to other models can be found at the bottom of this page.
459
+
460
+ ## Model Details
461
+
462
+ ### Model Description
463
+
464
+ <!-- Provide a longer summary of what this model is. -->
465
+ OpenLLM represents the first open-source effort to build a LLM specialized for Romanian. OpenLLM-Ro developed and publicly releases a collection of Romanian LLMs, both in the form of foundational model and instruct and chat variants.
466
+
467
+
468
+ - **Developed by:** OpenLLM-Ro
469
+ <!-- - **Funded by [optional]:** [More Information Needed] -->
470
+ <!-- - **Shared by [optional]:** [More Information Needed] -->
471
+ <!-- - **Model type:** [More Information Needed] -->
472
+ - **Language(s):** Romanian
473
+ - **License:** Llama2 Community License Agreement
474
+ - **Continual pretrained from model:** [Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b-hf)
475
+ - **Trained using:** [CulturaX](https://huggingface.co/datasets/uonlp/CulturaX)
476
+
477
+
478
+ ### Model Sources
479
+
480
+ <!-- Provide the basic links for the model. -->
481
+
482
+ - **Repository:** https://github.com/OpenLLM-Ro/llama-recipes
483
+ - **Paper:** https://arxiv.org/abs/2406.18266
484
+
485
+ ## Intended Use
486
+
487
+ ### Intended Use Cases
488
+
489
+ RoLlama2 is intented for research use in Romanian. Base models can be adapted for a variety of natural language tasks while instruction and chat tuned models are intended for assistant-like chat.
490
+
491
+ ### Out-of-Scope Use
492
+
493
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
494
+
495
+ Use in any manner that violates the license, any applicable laws or regluations, use in languages other than Romanian.
496
+
497
+
498
+
499
+ ## How to Get Started with the Model
500
+
501
+ Use the code below to get started with the model.
502
+
503
+ ```python
504
+ from transformers import AutoTokenizer, AutoModelForCausalLM
505
+
506
+ tokenizer = AutoTokenizer.from_pretrained("OpenLLM-Ro/RoLlama2-7b-Base-2024-05-14")
507
+ model = AutoModelForCausalLM.from_pretrained("OpenLLM-Ro/RoLlama2-7b-Base-2024-05-14")
508
+
509
+ input_text = "Mihai Eminescu a fost "
510
+ input_ids = tokenizer(input_text, return_tensors="pt")
511
+
512
+ outputs = model.generate(**input_ids, max_new_tokens=100)
513
+ print(tokenizer.decode(outputs[0]))
514
+ ```
515
+
516
+ ## Academic Benchmarks
517
+
518
+ <table>
519
+ <tbody>
520
+ <tr>
521
+ <td><strong>Model</strong></td>
522
+ <td><strong><center>Average</center></strong></td>
523
+ <td><strong><center>ARC</center></strong></td>
524
+ <td><strong><center>MMLU</center></strong></td>
525
+ <td><strong><center>Winogrande</center></strong></td>
526
+ <td><strong><center>Hellaswag</center></strong></td>
527
+ <td><strong><center>GSM8k</center></strong></td>
528
+ <td><strong><center>TruthfulQA</center></strong></td>
529
+ </tr>
530
+ <tr>
531
+ <td>Llama-2-7b</td><td><center>37.04</center></td><td><center>36.05</center></td><td><center><strong>33.66</strong></center></td><td><center>57.56</center></td><td><center>48.00</center></td><td><center><strong>4.75</strong></center></td><td><center>42.22</center></td>
532
+ </tr>
533
+ <tr>
534
+ <td><em>RoLlama2-7b-Base-2024-05-14</em></td><td><center><em><strong>38.03</strong></em></center></td><td><center><em><strong>37.95</strong></em></center></td><td><center><em>27.22</em></center></td><td><center><em><strong>59.29</strong></em></center></td><td><center><em><strong>57.22</strong></em></center></td><td><center><em>2.53</em></center></td><td><center><em><strong>44.00</strong></em></center></td>
535
+ </tr>
536
+ </tbody>
537
+ </table>
538
+
539
+ ## Downstream Tasks
540
+
541
+
542
+ <table>
543
+ <tbody>
544
+ <tr>
545
+ <td></td>
546
+ <td colspan="4"><center><strong>LaRoSeDa</strong></center></td>
547
+ <td colspan="4"><center><strong>WMT</strong></center></td>
548
+ </tr>
549
+ <tr>
550
+ <td></td>
551
+ <td colspan="2"><center><strong>Few-shot</strong></center></td>
552
+ <td colspan="2"><center><strong>Finetuned</strong></center></td>
553
+ <td colspan="2"><center><strong>Few-shot</strong></center></td>
554
+ <td colspan="2"><center><strong>Finetuned</strong></center></td>
555
+ </tr>
556
+ <tr>
557
+ <td><strong>Model</strong></td>
558
+ <td><center><strong>Binary<br>(Macro F1)</strong></center></td>
559
+ <td><center><strong>Multiclass<br>(Macro F1)</strong></center></td>
560
+ <td><center><strong>Binary<br>(Macro F1)</strong></center></td>
561
+ <td><center><strong>Multiclass<br>(Macro F1)</strong></center></td>
562
+ <td><center><strong>EN-RO<br>(Bleu)</strong></center></td>
563
+ <td><center><strong>RO-EN<br>(Bleu)</strong></center></td>
564
+ <td><center><strong>EN-RO<br>(Bleu)</strong></center></td>
565
+ <td><center><strong>RO-EN<br>(Bleu)</strong></center>
566
+ </tr>
567
+ <tr>
568
+ <td>Llama-2-7b</td><td><center><strong>93.19</strong></center></td><td><center>54.11</center></td><td><center>98.43</center></td><td><center>87.22</center></td><td><center><strong>14.90</strong></center></td><td><center><strong>26.61</strong></center></td><td><center>24.95</center></td><td><center>39.09</center></td>
569
+ </tr>
570
+ <tr>
571
+ <td><em>RoLlama2-7b-Base-2024-05-14</em></td><td><center><em>83.25</em></center></td><td><center><em><strong>61.04</strong></em></center></td><td><center><em><strong>98.97</strong></em></center></td><td><center><em><strong>87.72</strong></em></center></td><td><center><em>10.01</em></center></td><td><center><em>13.03</em></center></td><td><center><em><strong>27.85</strong></em></center></td><td><center><em><strong>39.30</strong></em></center></td>
572
+ </tr>
573
+ </tbody>
574
+ </table>
575
+
576
+
577
+ <table>
578
+ <tbody>
579
+ <tr>
580
+ <td></td>
581
+ <td colspan="4"><center><strong>XQuAD</strong></center></td>
582
+ <td colspan="4"><center><strong>STS</strong></center></td>
583
+ </tr>
584
+ <tr>
585
+ <td></td>
586
+ <td colspan="2"><center><strong>Few-shot</strong></center></td>
587
+ <td colspan="2"><center><strong>Finetuned</strong></center></td>
588
+ <td colspan="2"><center><strong>Few-shot</strong></center></td>
589
+ <td colspan="2"><center><strong>Finetuned</strong></center></td>
590
+ </tr>
591
+ <tr>
592
+ <td><strong>Model</strong></td>
593
+ <td><center><strong>(EM)</strong></center></td>
594
+ <td><center><strong>(F1)</strong></center></td>
595
+ <td><center><strong>(EM)</strong></center></td>
596
+ <td><center><strong>(F1)</strong></center></td>
597
+ <td><center><strong>(Spearman)</strong></center></td>
598
+ <td><center><strong>(Pearson)</strong></center></td>
599
+ <td><center><strong>(Spearman)</strong></center></td>
600
+ <td><center><strong>(Pearson)</strong></center></td>
601
+ </tr>
602
+ <tr>
603
+ <td>Llama-2-7b</td><td><center><strong>38.91</strong></center></td><td><center><strong>56.82</strong></center></td><td><center>65.46</center></td><td><center>79.42</center></td><td><center><strong>9.08</strong></center></td><td><center><strong>9.07</strong></center></td><td><center><strong>79.93</strong></center></td><td><center><strong>81.08</strong></center></td>
604
+ </tr>
605
+ <tr>
606
+ <td><em>RoLlama2-7b-Base-2024-05-14</em></td><td><center><em>30.15</em></center></td><td><center><em>47.03</em></center></td><td><center><em><strong>67.06</strong></em></center></td><td><center><em><strong>79.96</strong></em></center></td><td><center><em>7.89</em></center></td><td><center><em>7.98</em></center></td><td><center><em>71.75</em></center></td><td><center><em>71.99</em></center></td>
607
+ </tr>
608
+ </tbody>
609
+ </table>
610
+
611
+
612
+ ## RoLlama2 Model Family
613
+
614
+ | Model | Link |
615
+ |--------------------|:--------:|
616
+ |RoLlama2-7b-Base-2024-05-14 | [link](https://huggingface.co/OpenLLM-Ro/RoLlama2-7b-Base-2024-05-14) |
617
+ |RoLlama2-7b-Instruct-2024-05-14 | [link](https://huggingface.co/OpenLLM-Ro/RoLlama2-7b-Instruct-2024-05-14) |
618
+ |*RoLlama2-7b-Instruct-2024-10-09*| [link](https://huggingface.co/OpenLLM-Ro/RoLlama2-7b-Instruct-2024-10-09) |
619
+ |RoLlama2-7b-Instruct-DPO-2024-10-09| [link](https://huggingface.co/OpenLLM-Ro/RoLlama2-7b-Instruct-DPO-2024-10-09) |
620
+
621
+ ## Citation
622
+
623
+ ```
624
+ @misc{masala2024vorbecstiromanecsterecipetrain,
625
+ title={"Vorbe\c{s}ti Rom\^ane\c{s}te?" A Recipe to Train Powerful Romanian LLMs with English Instructions},
626
+ author={Mihai Masala and Denis C. Ilie-Ablachim and Alexandru Dima and Dragos Corlatescu and Miruna Zavelca and Ovio Olaru and Simina Terian-Dan and Andrei Terian-Dan and Marius Leordeanu and Horia Velicu and Marius Popescu and Mihai Dascalu and Traian Rebedea},
627
+ year={2024},
628
+ eprint={2406.18266},
629
+ archivePrefix={arXiv},
630
+ primaryClass={cs.CL},
631
+ url={https://arxiv.org/abs/2406.18266},
632
+ }
633
+ ```
634
+ <!-- **APA:**
635
+
636
+ [More Information Needed] -->
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "</s>": 2,
3
+ "<s>": 1,
4
+ "<unk>": 0
5
+ }
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "meta-llama/Llama-2-7b-hf",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 11008,
13
+ "max_position_embeddings": 4096,
14
+ "model_type": "llama",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 32,
18
+ "pretraining_tp": 1,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_scaling": null,
21
+ "rope_theta": 10000.0,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.34.0",
25
+ "use_cache": true,
26
+ "vocab_size": 32000
27
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.34.0"
6
+ }
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 26953662464
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00003-of-00003.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
16
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
17
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
18
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
19
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
20
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
21
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
22
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
23
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
24
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
25
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
26
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
27
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
28
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
29
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
30
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
31
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
32
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
33
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
34
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
35
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
36
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
37
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
38
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
39
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
40
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
41
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
42
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
43
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
44
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
45
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
46
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
47
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
48
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
49
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
50
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
51
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
52
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
53
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
54
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
55
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
56
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
57
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
58
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
59
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
60
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
61
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
62
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
63
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
64
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
65
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
66
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
67
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
68
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
69
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
70
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
71
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
72
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
73
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
74
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
75
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
76
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
77
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
78
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
79
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
80
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
81
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
82
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
83
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
84
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
85
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
86
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
87
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
88
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
89
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
90
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
91
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
92
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
93
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
94
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
95
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
96
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
97
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
98
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
99
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
100
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
101
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
102
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
103
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
104
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
105
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
106
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
107
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
108
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
109
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
110
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
111
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
112
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
113
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
114
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
115
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
116
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
117
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
118
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
119
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
120
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
121
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
122
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
123
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
124
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
125
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
126
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
127
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
128
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
129
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
130
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
131
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
132
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
133
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
134
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
135
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
136
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
137
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
138
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
139
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
140
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
141
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
142
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
143
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
144
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
145
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
146
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
147
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
148
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
149
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
150
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
151
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
152
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
153
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
154
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
155
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
156
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
157
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
158
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
159
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
160
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
161
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
162
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
163
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
164
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
165
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
166
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
167
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
168
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
169
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
170
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
171
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
172
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
173
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
174
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
175
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
176
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
177
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
178
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
179
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
180
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
181
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
182
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
183
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
184
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
185
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
186
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
187
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
188
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
189
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
190
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
191
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
192
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
193
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
194
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
195
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
196
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
197
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
198
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
199
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
200
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
201
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
202
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
203
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
204
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
205
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
206
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
207
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
208
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
209
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
210
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
211
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
212
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
213
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
214
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
215
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
216
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
217
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
218
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
219
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
220
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
221
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
222
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
223
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
224
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
225
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
226
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
227
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
228
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
229
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
230
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
231
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
232
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
233
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
234
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
235
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
236
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
237
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
238
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
239
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
240
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
241
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
242
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
243
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
244
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
245
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
246
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
247
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
248
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
249
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
250
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
251
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
252
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
253
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
254
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
255
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
256
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
257
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
258
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
259
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
260
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
261
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
262
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
263
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
264
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
265
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
266
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
267
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
268
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
269
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
270
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
271
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
272
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
273
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
274
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
275
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
276
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
277
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
278
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
279
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
280
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
281
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
282
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
283
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
284
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
285
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
286
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
287
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
288
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
289
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
290
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
291
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
292
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
293
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
294
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
295
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
296
+ "model.norm.weight": "pytorch_model-00003-of-00003.bin"
297
+ }
298
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": "<s>",
8
+ "eos_token": "</s>",
9
+ "pad_token": "<unk>",
10
+ "unk_token": "<unk>"
11
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [
31
+ "<unk>",
32
+ "<s>",
33
+ "</s>"
34
+ ],
35
+ "bos_token": "<s>",
36
+ "clean_up_tokenization_spaces": false,
37
+ "eos_token": "</s>",
38
+ "legacy": false,
39
+ "model_max_length": 1000000000000000019884624838656,
40
+ "pad_token": "<unk>",
41
+ "padding_side": "right",
42
+ "sp_model_kwargs": {},
43
+ "spaces_between_special_tokens": false,
44
+ "tokenizer_class": "LlamaTokenizer",
45
+ "tokenizer_file": null,
46
+ "unk_token": "<unk>",
47
+ "use_default_system_prompt": true
48
+ }
train_params.yaml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ batch_size_training: '32'
2
+ checkpoint_type: StateDictType.FULL_STATE_DICT
3
+ dataset: foundational_dataset
4
+ dist_checkpoint_folder: fine-tuned
5
+ dist_checkpoint_root_folder: test_run_save
6
+ enable_fsdp: 'True'
7
+ freeze_layers: 'False'
8
+ fsdp_activation_checkpointing: 'True'
9
+ gamma: '0.9'
10
+ load_peft_model: 'False'
11
+ low_cpu_fsdp: 'False'
12
+ lr: '0.0001'
13
+ micro_batch_size: '32'
14
+ mixed_precision: 'True'
15
+ model_name: models/v3/llama7b-full-1e-4_low-chunk1024-009-017
16
+ num_epochs: '1'
17
+ num_freeze_layers: '1'
18
+ num_workers_dataloader: '2'
19
+ one_gpu: 'False'
20
+ optimizer: AdamW
21
+ output_dir: PATH/to/save/PEFT/model
22
+ peft_method: lora
23
+ pure_bf16: 'True'
24
+ quantization: 'False'
25
+ run_validation: 'True'
26
+ save_model: 'True'
27
+ save_optimizer: 'False'
28
+ seed: '42'
29
+ sharding_strategy: ShardingStrategy.FULL_SHARD
30
+ type_of_model: foundational
31
+ use_fast_kernels: 'False'
32
+ use_fp16: 'False'
33
+ use_peft: 'False'
34
+ val_batch_size: '64'
35
+ weight_decay: '0.0'