OpenLLM-Ro
/

RoMistral-7b-Instruct-2024-05-17

@@ -4,6 +4,489 @@ language:
 - ro
 base_model:
 - mistralai/Mistral-7B-v0.1
 ---
 # Model Card for Model ID
@@ -27,6 +510,7 @@ OpenLLM-Ro represents the first open-source effort to build a LLM specialized fo
 - **Language(s):** Romanian
 - **License:** cc-by-nc-4.0
 - **Finetuned from model:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
 <!-- - **Finetuned from model [optional]:** [More Information Needed] -->
@@ -34,7 +518,7 @@ OpenLLM-Ro represents the first open-source effort to build a LLM specialized fo
 <!-- Provide the basic links for the model. -->
-- **Repository:** https://github.com/OpenLLM-Ro/llama-recipes
 - **Paper:** https://arxiv.org/abs/2406.18266
 ## Intended Use
@@ -72,30 +556,139 @@ outputs = model.generate(input_ids=inputs, max_new_tokens=128)
 print(tokenizer.decode(outputs[0]))
 ```
-## Benchmarks
-| Model              | Average  | ARC      | MMLU     |Winogrande|HellaSwag | GSM8k    |TruthfulQA|
-|--------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
-| Mistral-7B-Instruct-v0.2| 47.41    | 46.25    | 47.04    | 58.72    | 54.25    | 13.59     | *64.63*    |
-| *RoMistral-7b-Instruct*  | ***52.49***    | ***50.39***    | ***51.64***    | ***66.69***    | ***60.24***    | ***33.71***     | 52.59    |
 ## MT-Bench
-| Model              | Average  | 1st turn      | 2nd turn     | Answers in Ro |
-|--------------------|:--------:|:--------:|:--------:| :--------:|
-| Mistral-7B-Instruct-v0.2    | 4.83    | 5.09    | **4.58**    | 154 / 160|
-| *RoMistral-7b-Instruct*| ***4.91***|***5.67***| *4.16* | ***160 / 160***|
-## RoCulturaBench
-| Model              | Score  | Answers in Ro|
-|--------------------|:--------:|:--------:|
-| Mistral-7B-Instruct-v0.2   | **3.75**    | 99 / 100   |
-|*RoMistral-7b-Instruct*| *3.17*| ***100 / 100*** |

 - ro
 base_model:
 - mistralai/Mistral-7B-v0.1
+datasets:
+- OpenLLM-Ro/ro_sft_alpaca
+- OpenLLM-Ro/ro_sft_alpaca_gpt4
+- OpenLLM-Ro/ro_sft_dolly
+- OpenLLM-Ro/ro_sft_selfinstruct_gpt4
+- OpenLLM-Ro/ro_sft_norobots
+- OpenLLM-Ro/ro_sft_orca
+- OpenLLM-Ro/ro_sft_camel
+model-index:
+    - name: OpenLLM-Ro/RoMistral-7b-Instruct
+      results:
+        - task:
+            type: text-generation
+          dataset:
+            name: RoMT-Bench
+            type: RoMT-Bench
+          metrics:
+            - name: Score
+              type: Score
+              value: 4.99
+        - task:
+            type: text-generation
+          dataset:
+            name: RoCulturaBench
+            type: RoCulturaBench
+          metrics:
+            - name: Score
+              type: Score
+              value: 3.38
+        - task:
+            type: text-generation
+          dataset:
+            name: Romanian_Academic_Benchmarks
+            type: Romanian_Academic_Benchmarks
+          metrics:
+            - name: Average accuracy
+              type: accuracy
+              value: 52.54
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_arc_challenge
+            type: OpenLLM-Ro/ro_arc_challenge
+          metrics:
+            - name: Average accuracy
+              type: accuracy
+              value: 50.41
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_mmlu
+            type: OpenLLM-Ro/ro_mmlu
+          metrics:
+            - name: Average accuracy
+              type: accuracy
+              value: 51.61
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_winogrande
+            type: OpenLLM-Ro/ro_winogrande
+          metrics:
+            - name: Average accuracy
+              type: accuracy
+              value: 66.48
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_hellaswag
+            type: OpenLLM-Ro/ro_hellaswag
+          metrics:
+            - name: Average accuracy
+              type: accuracy
+              value: 60.27
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_gsm8k
+            type: OpenLLM-Ro/ro_gsm8k
+          metrics:
+            - name: Average accuracy
+              type: accuracy
+              value: 34.19
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_truthfulqa
+            type: OpenLLM-Ro/ro_truthfulqa
+          metrics:
+            - name: Average accuracy
+              type: accuracy
+              value: 52.30
+        - task:
+            type: text-generation
+          dataset:
+            name: LaRoSeDa_binary
+            type: LaRoSeDa_binary
+          metrics:
+            - name: Average macro-f1
+              type: macro-f1
+              value: 97.36
+        - task:
+            type: text-generation
+          dataset:
+            name: LaRoSeDa_multiclass
+            type: LaRoSeDa_multiclass
+          metrics:
+            - name: Average macro-f1
+              type: macro-f1
+              value: 67.55
+        - task:
+            type: text-generation
+          dataset:
+            name: LaRoSeDa_binary_finetuned
+            type: LaRoSeDa_binary_finetuned
+          metrics:
+            - name: Average macro-f1
+              type: macro-f1
+              value: 98.80
+        - task:
+            type: text-generation
+          dataset:
+            name: LaRoSeDa_multiclass_finetuned
+            type: LaRoSeDa_multiclass_finetuned
+          metrics:
+            - name: Average macro-f1
+              type: macro-f1
+              value: 88.28
+        - task:
+            type: text-generation
+          dataset:
+            name: WMT_EN-RO
+            type: WMT_EN-RO
+          metrics:
+            - name: Average bleu
+              type: bleu
+              value: 27.93
+        - task:
+            type: text-generation
+          dataset:
+            name: WMT_RO-EN
+            type: WMT_RO-EN
+          metrics:
+            - name: Average bleu
+              type: bleu
+              value: 13.21
+        - task:
+            type: text-generation
+          dataset:
+            name: WMT_EN-RO_finetuned
+            type: WMT_EN-RO_finetuned
+          metrics:
+            - name: Average bleu
+              type: bleu
+              value: 28.72
+        - task:
+            type: text-generation
+          dataset:
+            name: WMT_RO-EN_finetuned
+            type: WMT_RO-EN_finetuned
+          metrics:
+            - name: Average bleu
+              type: bleu
+              value: 40.86
+        - task:
+            type: text-generation
+          dataset:
+            name: XQuAD
+            type: XQuAD
+          metrics:
+            - name: Average exact_match
+              type: exact_match
+              value: 43.66
+        - task:
+            type: text-generation
+          dataset:
+            name: XQuAD
+            type: XQuAD
+          metrics:
+            - name: Average f1
+              type: f1
+              value: 63.70
+        - task:
+            type: text-generation
+          dataset:
+            name: XQuAD_finetuned
+            type: XQuAD_finetuned
+          metrics:
+            - name: Average exact_match
+              type: exact_match
+              value: 55.04
+        - task:
+            type: text-generation
+          dataset:
+            name: XQuAD_finetuned
+            type: XQuAD_finetuned
+          metrics:
+            - name: Average f1
+              type: f1
+              value: 72.31
+        - task:
+            type: text-generation
+          dataset:
+            name: STS
+            type: STS
+          metrics:
+            - name: Average spearman
+              type: spearman
+              value: 77.43
+        - task:
+            type: text-generation
+          dataset:
+            name: STS
+            type: STS
+          metrics:
+            - name: Average pearson
+              type: pearson
+              value: 78.43
+        - task:
+            type: text-generation
+          dataset:
+            name: STS_finetuned
+            type: STS_finetuned
+          metrics:
+            - name: Average spearman
+              type: spearman
+              value: 87.25
+        - task:
+            type: text-generation
+          dataset:
+            name: STS_finetuned
+            type: STS_finetuned
+          metrics:
+            - name: Average pearson
+              type: pearson
+              value: 87.79
+        - task:
+            type: text-generation
+          dataset:
+            name: RoMT-Bench
+            type: RoMT-Bench
+          metrics:
+            - name: First turn
+              type: Score
+              value: 5.46
+            - name: Second turn
+              type: Score
+              value: 4.53
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_arc_challenge
+            type: OpenLLM-Ro/ro_arc_challenge
+          metrics:
+            - name: 0-shot
+              type: accuracy
+              value: 47.47
+            - name: 1-shot
+              type: accuracy
+              value: 48.59
+            - name: 3-shot
+              type: accuracy
+              value: 50.30
+            - name: 5-shot
+              type: accuracy
+              value: 51.33
+            - name: 10-shot
+              type: accuracy
+              value: 52.36
+            - name: 25-shot
+              type: accuracy
+              value: 52.44
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_mmlu
+            type: OpenLLM-Ro/ro_mmlu
+          metrics:
+            - name: 0-shot
+              type: accuracy
+              value: 50.01
+            - name: 1-shot
+              type: accuracy
+              value: 50.18
+            - name: 3-shot
+              type: accuracy
+              value: 53.13
+            - name: 5-shot
+              type: accuracy
+              value: 53.12
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_winogrande
+            type: OpenLLM-Ro/ro_winogrande
+          metrics:
+            - name: 0-shot
+              type: accuracy
+              value: 64.96
+            - name: 1-shot
+              type: accuracy
+              value: 67.09
+            - name: 3-shot
+              type: accuracy
+              value: 67.01
+            - name: 5-shot
+              type: accuracy
+              value: 66.85
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_hellaswag
+            type: OpenLLM-Ro/ro_hellaswag
+          metrics:
+            - name: 0-shot
+              type: accuracy
+              value: 59.99
+            - name: 1-shot
+              type: accuracy
+              value: 59.48
+            - name: 3-shot
+              type: accuracy
+              value: 60.14
+            - name: 5-shot
+              type: accuracy
+              value: 60.61
+            - name: 10-shot
+              type: accuracy
+              value: 61.12
+        - task:
+            type: text-generation
+          dataset:
+            name: OpenLLM-Ro/ro_gsm8k
+            type: OpenLLM-Ro/ro_gsm8k
+          metrics:
+            - name: 0-shot
+              type: accuracy
+              value: 21.68
+            - name: 1-shot
+              type: accuracy
+              value: 38.21
+            - name: 3-shot
+              type: accuracy
+              value: 42.68
+        - task:
+            type: text-generation
+          dataset:
+            name: LaRoSeDa_binary
+            type: LaRoSeDa_binary
+          metrics:
+            - name: 0-shot
+              type: macro-f1
+              value: 97.27
+            - name: 1-shot
+              type: macro-f1
+              value: 96.37
+            - name: 3-shot
+              type: macro-f1
+              value: 97.97
+            - name: 5-shot
+              type: macro-f1
+              value: 97.83
+        - task:
+            type: text-generation
+          dataset:
+            name: LaRoSeDa_multiclass
+            type: LaRoSeDa_multiclass
+          metrics:
+            - name: 0-shot
+              type: macro-f1
+              value: 63.95
+            - name: 1-shot
+              type: macro-f1
+              value: 66.89
+            - name: 3-shot
+              type: macro-f1
+              value: 68.16
+            - name: 5-shot
+              type: macro-f1
+              value: 71.19
+        - task:
+            type: text-generation
+          dataset:
+            name: WMT_EN-RO
+            type: WMT_EN-RO
+          metrics:
+            - name: 0-shot
+              type: bleu
+              value: 24.87
+            - name: 1-shot
+              type: bleu
+              value: 28.30
+            - name: 3-shot
+              type: bleu
+              value: 29.26
+            - name: 5-shot
+              type: bleu
+              value: 29.27
+        - task:
+            type: text-generation
+          dataset:
+            name: WMT_RO-EN
+            type: WMT_RO-EN
+          metrics:
+            - name: 0-shot
+              type: bleu
+              value: 3.69
+            - name: 1-shot
+              type: bleu
+              value: 5.45
+            - name: 3-shot
+              type: bleu
+              value: 19.92
+            - name: 5-shot
+              type: bleu
+              value: 23.80
+        - task:
+            type: text-generation
+          dataset:
+            name: XQuAD_EM
+            type: XQuAD_EM
+          metrics:
+            - name: 0-shot
+              type: exact_match
+              value: 23.36
+            - name: 1-shot
+              type: exact_match
+              value: 47.98
+            - name: 3-shot
+              type: exact_match
+              value: 51.85
+            - name: 5-shot
+              type: exact_match
+              value: 51.43
+        - task:
+            type: text-generation
+          dataset:
+            name: XQuAD_F1
+            type: XQuAD_F1
+          metrics:
+            - name: 0-shot
+              type: f1
+              value: 46.29
+            - name: 1-shot
+              type: f1
+              value: 67.40
+            - name: 3-shot
+              type: f1
+              value: 70.58
+            - name: 5-shot
+              type: f1
+              value: 70.53
+        - task:
+            type: text-generation
+          dataset:
+            name: STS
+            type: STS
+          metrics:
+            - name: 0-shot
+              type: spearman
+              value: 77.91
+            - name: 1-shot
+              type: spearman
+              value: 77.73
+            - name: 3-shot
+              type: spearman
+              value: 76.65
+        - task:
+            type: text-generation
+          dataset:
+            name: STS
+            type: STS
+          metrics:
+            - name: 0-shot
+              type: pearson
+              value: 78.03
+            - name: 1-shot
+              type: pearson
+              value: 78.74
+            - name: 3-shot
+              type: pearson
+              value: 78.53
 ---
 # Model Card for Model ID
 - **Language(s):** Romanian
 - **License:** cc-by-nc-4.0
 - **Finetuned from model:** [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- **Trained using:** [RoAlpaca](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_alpaca), [RoAlpacaGPT4](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_alpaca_gpt4), [RoDolly](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_dolly), [RoSelfInstruct](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_selfinstruct_gpt4), [RoNoRobots](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_norobots), [RoOrca](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_orca), [RoCamel](https://huggingface.co/datasets/OpenLLM-Ro/ro_sft_camel)
 <!-- - **Finetuned from model [optional]:** [More Information Needed] -->
 <!-- Provide the basic links for the model. -->
+- **Repository:** https://github.com/OpenLLM-Ro/LLaMA-Factory
 - **Paper:** https://arxiv.org/abs/2406.18266
 ## Intended Use
 print(tokenizer.decode(outputs[0]))
 ```
+## Academic Benchmarks
+<table>
+<tbody>
+<tr>
+<td><strong>Model</strong></td>
+<td><strong><center>Average</center></strong></td>
+<td><strong><center>ARC</center></strong></td>
+<td><strong><center>MMLU</center></strong></td>
+<td><strong><center>Winogrande</center></strong></td>
+<td><strong><center>Hellaswag</center></strong></td>
+<td><strong><center>GSM8k</center></strong></td>
+<td><strong><center>TruthfulQA</center></strong></td>
+</tr>
+<tr>
+<td>Mistral-7B-Instruct-v0.2</td><td><center>47.40</center></td><td><center>46.29</center></td><td><center>47.01</center></td><td><center>58.78</center></td><td><center>54.27</center></td><td><center>13.47</center></td><td><center><strong>64.59</strong></center></td>
+</tr>
+<tr>
+<td><em>RoMistral-7b-Instruct</em></td><td><center><em><strong>52.54</strong></em></center></td><td><center><em><strong>50.42</strong></em></center></td><td><center><em><strong>51.61</strong></em></center></td><td><center><em><strong>66.48</strong></em></center></td><td><center><em><strong>60.27</strong></em></center></td><td><center><em><strong>34.19</strong></em></center></td><td><center><em>52.30</em></center></td>
+</tr>
+</tbody>
+</table>
+## Downstream tasks
+<table>
+<tbody>
+<tr>
+<td></td>
+<td colspan="4"><center><strong>LaRoSeDa</strong></center></td>
+<td colspan="4"><center><strong>WMT</strong></center></td>
+</tr>
+<tr>
+<td></td>
+<td colspan="2"><center><strong>Few-shot</strong></center></td>
+<td colspan="2"><center><strong>Finetuned</strong></center></td>
+<td colspan="2"><center><strong>Few-shot</strong></center></td>
+<td colspan="2"><center><strong>Finetuned</strong></center></td>
+</tr>
+<tr>
+<td><strong>Model</strong></td>
+<td><center><strong>Binary<br>(Macro F1)</strong></center></td>
+<td><center><strong>Multiclass<br>(Macro F1)</strong></center></td>
+<td><center><strong>Binary<br>(Macro F1)</strong></center></td>
+<td><center><strong>Multiclass<br>(Macro F1)</strong></center></td>
+<td><center><strong>EN-RO<br>(Bleu)</strong></center></td>
+<td><center><strong>RO-EN<br>(Bleu)</strong></center></td>
+<td><center><strong>EN-RO<br>(Bleu)</strong></center></td>
+<td><center><strong>RO-EN<br>(Bleu)</strong></center>
+</tr>
+<tr>
+<td>Mistral-7B-Instruct-v0.2</td><td><center>96.97</center></td><td><center>56.66</center></td><td><center><strong>98.83</strong></center></td><td><center>87.32</center></td><td><center>18.60</center></td><td><center><strong>33.99</strong></center></td><td><center>26.19</center></td><td><center>39.88</center></td>
+</tr>
+<tr>
+<td><em>RoMistral-7b-Instruct</em></td><td><center><em><strong>97.36</strong></em></center></td><td><center><em><strong>67.55</strong></em></center></td><td><center><em>98.80</em></center></td><td><center><em><strong>88.28</strong></em></center></td><td><center><em><strong>27.93</strong></em></center></td><td><center><em>13.21</em></center></td><td><center><em><strong>28.72</strong></em></center></td><td><center><em><strong>40.86</strong></em></center></td>
+</tr>
+</tbody>
+</table>
+<table>
+<tbody>
+<tr>
+<td></td>
+<td colspan="4"><center><strong>XQuAD</strong></center></td>
+<td colspan="4"><center><strong>STS</strong></center></td>
+</tr>
+<tr>
+<td></td>
+<td colspan="2"><center><strong>Few-shot</strong></center></td>
+<td colspan="2"><center><strong>Finetuned</strong></center></td>
+<td colspan="2"><center><strong>Few-shot</strong></center></td>
+<td colspan="2"><center><strong>Finetuned</strong></center></td>
+</tr>
+<tr>
+<td><strong>Model</strong></td>
+<td><center><strong>(EM)</strong></center></td>
+<td><center><strong>(F1)</strong></center></td>
+<td><center><strong>(EM)</strong></center></td>
+<td><center><strong>(F1)</strong></center></td>
+<td><center><strong>(Spearman)</strong></center></td>
+<td><center><strong>(Pearson)</strong></center></td>
+<td><center><strong>(Spearman)</strong></center></td>
+<td><center><strong>(Pearson)</strong></center></td>
+</tr>
+<tr>
+<td>Mistral-7B-Instruct-v0.2</td><td><center>27.92</center></td><td><center>50.71</center></td><td><center><strong>65.46</strong></center></td><td><center><strong>79.73</strong></center></td><td><center>62.62</center></td><td><center>60.86</center></td><td><center>84.92</center></td><td><center>85.44</center></td>
+</tr>
+<tr>
+<td><em>RoMistral-7b-Instruct</em></td><td><center><em><strong>43.66</strong></em></center></td><td><center><em><strong>63.70</strong></em></center></td><td><center><em>55.04</em></center></td><td><center><em>72.31</em></center></td><td><center><em><strong>77.43</strong></em></center></td><td><center><em><strong>78.43</strong></em></center></td><td><center><em><strong>87.25</strong></em></center></td><td><center><em><strong>87.79</strong></em></center></td>
+</tr>
+</tbody>
+</table>
 ## MT-Bench
+<table>
+<tbody>
+<tr>
+<td><strong>Model</strong></td>
+<td><strong><center>Average</center></strong></td>
+<td><strong><center>1st turn</center></strong></td>
+<td><strong><center>2nd turn</center></strong></td>
+<td><strong><center>Answers in Ro</center></strong></td>
+</tr>
+<tr>
+<td><em>Mistral-7B-Instruct-v0.2</em></td><td><center><em><strong>5.03</strong></em></center></td><td><center><em>5.05</em></center></td><td><center><em><strong>5.00</strong></em></center></td><td><center><em>154/160</em></center></td>
+</tr>
+<tr>
+<td><em>RoMistral-7b-Instruct</em></td><td><center><em>4.99</em></center></td><td><center><em><strong>5.46</strong></em></center></td><td><center><em>4.53</em></center></td><td><center><em><strong>160/160</strong></em></center></td>
+</tr>
+</tbody>
+</table>
+## RoCulturaBench
+<table>
+<tbody>
+<tr>
+<td><strong>Model</strong></td>
+<td><strong><center>Average</center></strong></td>
+<td><strong><center>Answers in Ro</center></strong></td>
+</tr>
+<tr>
+<td><em>Mistral-7B-Instruct-v0.2</em></td><td><center><em><strong>3.68</strong></em></center></td><td><center><em>97/100</em></center></td>
+</tr>
+<tr>
+<td><em>RoMistral-7b-Instruct</em></td><td><center><em>3.38</em></center></td><td><center><em><strong>100/100</strong></em></center></td>
+</tr>
+</tbody>
+</table>