sxn6144 commited on
Commit
f9498b8
1 Parent(s): 30cb415

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -27,6 +27,21 @@ Models input text only.
27
  ## Output
28
  Models output text only.
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ---
31
  ## Samples
32
  #### Sample1(Vicuna)
 
27
  ## Output
28
  Models output text only.
29
 
30
+ ## Model Evaluation Results
31
+
32
+ Experiments on Arabic MMLU and EXAMs. 'AverageBest', 'STEM','Humanities','Social Sciences' and 'Others (Business, Health, Misc)'
33
+ belong to Arabic MMLU. Best performance is in bold and the second best is underlined.
34
+
35
+ | Model | Average | STEM | Humanities | Social Sciences | Others (Business, Health, Misc) |EXAMs |
36
+ |-----------------|---------|------|------------|-----------------|---------------------------------|--------------|
37
+ | Bloomz Muennighoff et al. (2022) | 30.95 | 32.32 | 26.71 | 35.85 | 28.95 | 33.89 |
38
+ | Llama2-7B | 28.81 | 28.48 | 26.68 | 29.88 | 30.18 | 23.48 |
39
+ | Llama2-13B | 31.25 | 31.06 | 27.11 | 35.5 | 31.35 | 25.45 |
40
+ | Jais-13B-base | 30.01 | 27.85 | 25.42 | 39.7 | 27.06 | 35.67 |
41
+ | AceGPT-7B-base | 30.36 | 26.63 | 28.17 | 35.15 | 31.5 | 31.96 |
42
+ | AceGPT-13B-base | <u>37.26</u> | <u>35.16</u> | <u>30.3</u> | <u>47.34</u> | <u>36.25</u> | <u>36.63</u> |
43
+ | ChatGPT | <b>46.07</b> | <b>44.17</b> | <b>35.33/<b> | <b>61.26</b> | <b>43.52</b> | <b>45.63 </b> |
44
+
45
  ---
46
  ## Samples
47
  #### Sample1(Vicuna)