ronnengmail commited on
Commit
b408473
Β·
verified Β·
1 Parent(s): 6bed388

Upload eval/belebele_3b.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. eval/belebele_3b.log +99 -0
eval/belebele_3b.log ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Loading tokenizer: /tmp/eval/multilingual_32k.model
2
+ Loading model: /tmp/eval/best_model.pt
3
+ Model loaded: 3.04B parameters on cuda
4
+
5
+ ============================================================
6
+ BELEBELE EVALUATION β€” Multilingual 3B GPT
7
+ ============================================================
8
+
9
+
10
+ Evaluating EN (eng_Latn)...
11
+ [EN] 50/900 β€” accuracy so far: 22.0%
12
+ [EN] 100/900 β€” accuracy so far: 29.0%
13
+ [EN] 150/900 β€” accuracy so far: 32.7%
14
+ [EN] 200/900 β€” accuracy so far: 30.5%
15
+ [EN] 250/900 β€” accuracy so far: 32.0%
16
+ [EN] 300/900 β€” accuracy so far: 31.7%
17
+ [EN] 350/900 β€” accuracy so far: 32.9%
18
+ [EN] 400/900 β€” accuracy so far: 33.5%
19
+ [EN] 450/900 β€” accuracy so far: 32.9%
20
+ [EN] 500/900 β€” accuracy so far: 31.4%
21
+ [EN] 550/900 β€” accuracy so far: 32.0%
22
+ [EN] 600/900 β€” accuracy so far: 32.2%
23
+ [EN] 650/900 β€” accuracy so far: 32.6%
24
+ [EN] 700/900 β€” accuracy so far: 32.3%
25
+ [EN] 750/900 β€” accuracy so far: 32.7%
26
+ [EN] 800/900 β€” accuracy so far: 32.0%
27
+ [EN] 850/900 β€” accuracy so far: 31.9%
28
+ [EN] 900/900 β€” accuracy so far: 31.8%
29
+ βœ… EN: 31.8% (286/900)
30
+
31
+ Evaluating HE (heb_Hebr)...
32
+ [HE] 50/900 β€” accuracy so far: 20.0%
33
+ [HE] 100/900 β€” accuracy so far: 25.0%
34
+ [HE] 150/900 β€” accuracy so far: 24.0%
35
+ [HE] 200/900 β€” accuracy so far: 26.0%
36
+ [HE] 250/900 β€” accuracy so far: 25.2%
37
+ [HE] 300/900 β€” accuracy so far: 25.7%
38
+ [HE] 350/900 β€” accuracy so far: 24.9%
39
+ [HE] 400/900 β€” accuracy so far: 24.8%
40
+ [HE] 450/900 β€” accuracy so far: 24.9%
41
+ [HE] 500/900 β€” accuracy so far: 24.2%
42
+ [HE] 550/900 β€” accuracy so far: 25.1%
43
+ [HE] 600/900 β€” accuracy so far: 25.3%
44
+ [HE] 650/900 β€” accuracy so far: 25.7%
45
+ [HE] 700/900 β€” accuracy so far: 25.4%
46
+ [HE] 750/900 β€” accuracy so far: 25.9%
47
+ [HE] 800/900 β€” accuracy so far: 26.2%
48
+ [HE] 850/900 β€” accuracy so far: 26.8%
49
+ [HE] 900/900 β€” accuracy so far: 27.0%
50
+ βœ… HE: 27.0% (243/900)
51
+
52
+ Evaluating AR (arb_Arab)...
53
+ [AR] 50/900 β€” accuracy so far: 28.0%
54
+ [AR] 100/900 β€” accuracy so far: 25.0%
55
+ [AR] 150/900 β€” accuracy so far: 24.7%
56
+ [AR] 200/900 β€” accuracy so far: 29.5%
57
+ [AR] 250/900 β€” accuracy so far: 30.8%
58
+ [AR] 300/900 β€” accuracy so far: 30.0%
59
+ [AR] 350/900 β€” accuracy so far: 28.6%
60
+ [AR] 400/900 β€” accuracy so far: 28.2%
61
+ [AR] 450/900 β€” accuracy so far: 28.7%
62
+ [AR] 500/900 β€” accuracy so far: 27.2%
63
+ [AR] 550/900 β€” accuracy so far: 27.5%
64
+ [AR] 600/900 β€” accuracy so far: 27.0%
65
+ [AR] 650/900 β€” accuracy so far: 27.7%
66
+ [AR] 700/900 β€” accuracy so far: 28.1%
67
+ [AR] 750/900 β€” accuracy so far: 28.9%
68
+ [AR] 800/900 β€” accuracy so far: 29.1%
69
+ [AR] 850/900 β€” accuracy so far: 28.9%
70
+ [AR] 900/900 β€” accuracy so far: 28.4%
71
+ βœ… AR: 28.4% (256/900)
72
+
73
+ Evaluating FA (pes_Arab)...
74
+ [FA] 50/900 β€” accuracy so far: 32.0%
75
+ [FA] 100/900 β€” accuracy so far: 33.0%
76
+ [FA] 150/900 β€” accuracy so far: 30.7%
77
+ [FA] 200/900 β€” accuracy so far: 30.5%
78
+ [FA] 250/900 β€” accuracy so far: 28.4%
79
+ [FA] 300/900 β€” accuracy so far: 29.0%
80
+ [FA] 350/900 β€” accuracy so far: 29.4%
81
+ [FA] 400/900 β€” accuracy so far: 30.0%
82
+ [FA] 450/900 β€” accuracy so far: 30.2%
83
+ [FA] 500/900 β€” accuracy so far: 30.8%
84
+ [FA] 550/900 β€” accuracy so far: 30.7%
85
+ [FA] 600/900 β€” accuracy so far: 30.3%
86
+ [FA] 650/900 β€” accuracy so far: 29.5%
87
+ [FA] 700/900 β€” accuracy so far: 29.4%
88
+ [FA] 750/900 β€” accuracy so far: 28.4%
89
+ [FA] 800/900 β€” accuracy so far: 27.8%
90
+ [FA] 850/900 β€” accuracy so far: 28.0%
91
+ [FA] 900/900 β€” accuracy so far: 28.2%
92
+ βœ… FA: 28.2% (254/900)
93
+
94
+ ============================================================
95
+ OVERALL: 28.9% (1039/3600)
96
+ Random baseline: 25.0%
97
+ ============================================================
98
+
99
+ Results saved to /tmp/eval/belebele_results.json