Fix typo
Browse files
README.md
CHANGED
@@ -103,32 +103,39 @@ Model evaluation metrics and results.
|
|
103 |
|
104 |
### Benchmark Results
|
105 |
|
106 |
-
|
|
107 |
-
|
108 |
-
| Default Metric
|
109 |
-
| Knowledge (5-shot)
|
110 |
-
|
|
111 |
-
|
|
112 |
-
|
|
113 |
-
|
|
114 |
-
|
|
115 |
-
|
|
116 |
-
|
|
117 |
-
|
|
118 |
-
|
|
119 |
-
|
|
120 |
-
| JP Eval Harness
|
121 |
-
|
|
122 |
-
|
|
123 |
-
|
|
124 |
-
|
|
125 |
-
|
|
126 |
-
| XWinograd (5-shot)
|
127 |
-
|
|
128 |
-
|
|
129 |
-
|
|
130 |
-
|
|
131 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
132 |
|
133 |
|
134 |
## Usage and Limitations
|
|
|
103 |
|
104 |
### Benchmark Results
|
105 |
|
106 |
+
| Category | Metric | Shots | 7b |
|
107 |
+
|----------------------------------|----------------------|------------|--------|
|
108 |
+
| **Default Metric** | **ACC** | | |
|
109 |
+
| **Knowledge (5-shot)** | MMLU | | 61.76 |
|
110 |
+
| | KMMLU | | 42.75 |
|
111 |
+
| | CMLU | | 50.93 |
|
112 |
+
| | JMLU | | |
|
113 |
+
| | C-EVAL | | 50.07 |
|
114 |
+
| | HAERAE (0-shot) | | 63.89 |
|
115 |
+
| **KoBest (5-shot)** | BoolQ | | 85.47 |
|
116 |
+
| | COPA | | 83.5 |
|
117 |
+
| | Hellaswag (acc-norm) | | 63.2 |
|
118 |
+
| | Sentineg | | 97.98 |
|
119 |
+
| | WiC | | 70.95 |
|
120 |
+
| **JP Eval Harness (Prompt ver 0.3)** | JcommonsenseQA | 3-shot | 85.97 |
|
121 |
+
| | JNLI | 3-shot | 39.11 |
|
122 |
+
| | Marc_ja | 3-shot | 96.48 |
|
123 |
+
| | JSquad | 2-shot | 70.69 |
|
124 |
+
| | Jaqket | 1-shot | 81.53 |
|
125 |
+
| | MGSM | 5-shot | 28.8 |
|
126 |
+
| **XWinograd (5-shot)** | EN | | 90.71 |
|
127 |
+
| | FR | | 80.72 |
|
128 |
+
| | JP | | 84.15 |
|
129 |
+
| | PT | | 80.99 |
|
130 |
+
| | RU | | 76.51 |
|
131 |
+
| | ZH | | 76.98 |
|
132 |
+
| **XCOPA (5-shot)** | IT | | 72.8 |
|
133 |
+
| | ID | | 76.4 |
|
134 |
+
| | TH | | 60.2 |
|
135 |
+
| | TR | | 65.6 |
|
136 |
+
| | VI | | 77.2 |
|
137 |
+
| | ZH | | 80.2 |
|
138 |
+
|
139 |
|
140 |
|
141 |
## Usage and Limitations
|