schnoh commited on
Commit
36fff89
1 Parent(s): 61124f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -3
README.md CHANGED
@@ -1,3 +1,145 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ datasets:
4
+ - legacy-datasets/wikipedia
5
+ language:
6
+ - en
7
+ - ko
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ ---
11
+ ## Model Details
12
+
13
+ This model was continually pretrained from the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), using English and Korean datasets.
14
+ The goal is to enhance its proficiency in Korean while maintaining its English language capabilities from the original model.
15
+
16
+ ### Datasets
17
+
18
+ We sampled 16B tokens from the following datasets for training:
19
+
20
+ <table>
21
+ <tr>
22
+ <td><strong>Sources</strong>
23
+ </td>
24
+ <td><strong>Tokens (Llama-3-8B)</strong>
25
+ </td>
26
+ </tr>
27
+ <tr>
28
+ <td>AI-Hub
29
+ </td>
30
+ <td>9.2B
31
+ </td>
32
+ </tr>
33
+ <tr>
34
+ <td>Modu Corpus
35
+ </td>
36
+ <td>5.8B
37
+ </td>
38
+ </tr>
39
+ <tr>
40
+ <td>Wikipedia
41
+ </td>
42
+ <td>5.4B
43
+ </td>
44
+ </tr>
45
+ </table>
46
+
47
+ ### Hyperparameters
48
+
49
+ <table>
50
+ <tr>
51
+ <td><strong>Learning rate</strong></td>
52
+ <td><strong>Optimizer</strong></td>
53
+ <td><strong>Betas</strong></td>
54
+ <td><strong>Weight decay</strong></td>
55
+ <td><strong>Warm-up ratio</strong></td>
56
+ </tr>
57
+ <tr>
58
+ <td>3e-5</td>
59
+ <td>AdamW</td>
60
+ <td>(0.9, 0.95)</td>
61
+ <td>0.1</td>
62
+ <td>0.05</td>
63
+ </tr>
64
+ </table>
65
+
66
+ ## Intended Use
67
+
68
+ This model has not been fine-tuned, so you will need to train it on your own dataset before using it.
69
+
70
+ ## Evaluations
71
+
72
+ We evaluated this model using both English and Korean benchmarks, and compared it with similar models that were continually pretrained from the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).
73
+
74
+ <table>
75
+ <tr>
76
+ <td></td>
77
+ <td colspan="4"><strong>English</strong></td>
78
+ <td colspan="3"><strong>Korean</strong></td>
79
+ </tr>
80
+ <tr>
81
+ <td><strong>Model</strong></td>
82
+ <td><strong>MMLU (5 shots)</strong></td>
83
+ <td><strong>HellaSwag (10 shots)</strong></td>
84
+ <td><strong>GSM8K (8 shots, CoT)</strong></td>
85
+ <td><strong>BBH (3 shots, CoT)</strong></td>
86
+ <td><strong>KMMLU (5 shots)</strong></td>
87
+ <td><strong>HAE-RAE (5 shots)</strong></td>
88
+ <td><strong>KoBEST (5 shots)</strong></td>
89
+ </tr>
90
+ <tr>
91
+ <td>meta-llama/Meta-Llama-3-8B</td>
92
+ <td><strong>65.1</strong></td>
93
+ <td><strong>82.1</strong></td>
94
+ <td><strong>52.0</strong></td>
95
+ <td><strong>61.9</strong></td>
96
+ <td>40.2</td>
97
+ <td>61.1</td>
98
+ <td>69.2</td>
99
+ </tr>
100
+ <tr>
101
+ <td>saltlux/Ko-Llama3-Luxia-8B</td>
102
+ <td>57.1</td>
103
+ <td>77.1</td>
104
+ <td>32.3</td>
105
+ <td>51.8</td>
106
+ <td>39.4</td>
107
+ <td>69.2</td>
108
+ <td>71.9</td>
109
+ </tr>
110
+ <tr>
111
+ <td>beomi/Llama-3-Open-Ko-8B</td>
112
+ <td>56.2</td>
113
+ <td>77.4</td>
114
+ <td>31.5</td>
115
+ <td>46.8</td>
116
+ <td>40.3</td>
117
+ <td>68.1</td>
118
+ <td><u>72.1</u></td>
119
+ </tr>
120
+ <tr>
121
+ <td>beomi/Llama-3-KoEn-8B</td>
122
+ <td>52.5</td>
123
+ <td>77.7</td>
124
+ <td>21.2</td>
125
+ <td>43.2</td>
126
+ <td><u>40.8</u></td>
127
+ <td><u>71.3</u></td>
128
+ <td><strong>73.8</strong></td>
129
+ </tr>
130
+ <tr>
131
+ <td><strong>tesser/Tesser-Llama-3-Ko-8B</strong></td>
132
+ <td><u>60.5</u></td>
133
+ <td><u>79.8</u></td>
134
+ <td><u>40.3</u></td>
135
+ <td><u>56.3</u></td>
136
+ <td><strong>42.5</strong></td>
137
+ <td><strong>72.1</strong></td>
138
+ <td><strong>73.8</strong></td>
139
+ </tr>
140
+ </table>
141
+
142
+
143
+ ## License
144
+
145
+ This model follows the original [Llama-3 license](https://llama.meta.com/llama3/license/).