Update README.md
Browse files
README.md
CHANGED
@@ -27,6 +27,21 @@ Models input text only.
|
|
27 |
## Output
|
28 |
Models output text only.
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
---
|
31 |
## Samples
|
32 |
#### Sample1(Vicuna)
|
|
|
27 |
## Output
|
28 |
Models output text only.
|
29 |
|
30 |
+
## Model Evaluation Results
|
31 |
+
|
32 |
+
Experiments on Arabic MMLU and EXAMs. 'AverageBest', 'STEM','Humanities','Social Sciences' and 'Others (Business, Health, Misc)'
|
33 |
+
belong to Arabic MMLU. Best performance is in bold and the second best is underlined.
|
34 |
+
|
35 |
+
| Model | Average | STEM | Humanities | Social Sciences | Others (Business, Health, Misc) |EXAMs |
|
36 |
+
|-----------------|---------|------|------------|-----------------|---------------------------------|--------------|
|
37 |
+
| Bloomz Muennighoff et al. (2022) | 30.95 | 32.32 | 26.71 | 35.85 | 28.95 | 33.89 |
|
38 |
+
| Llama2-7B | 28.81 | 28.48 | 26.68 | 29.88 | 30.18 | 23.48 |
|
39 |
+
| Llama2-13B | 31.25 | 31.06 | 27.11 | 35.5 | 31.35 | 25.45 |
|
40 |
+
| Jais-13B-base | 30.01 | 27.85 | 25.42 | 39.7 | 27.06 | 35.67 |
|
41 |
+
| AceGPT-7B-base | 30.36 | 26.63 | 28.17 | 35.15 | 31.5 | 31.96 |
|
42 |
+
| AceGPT-13B-base | <u>37.26</u> | <u>35.16</u> | <u>30.3</u> | <u>47.34</u> | <u>36.25</u> | <u>36.63</u> |
|
43 |
+
| ChatGPT | <b>46.07</b> | <b>44.17</b> | <b>35.33/<b> | <b>61.26</b> | <b>43.52</b> | <b>45.63 </b> |
|
44 |
+
|
45 |
---
|
46 |
## Samples
|
47 |
#### Sample1(Vicuna)
|