beomi commited on
Commit
00ea4b9
โ€ข
1 Parent(s): b234754

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -145
README.md CHANGED
@@ -1,201 +1,186 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
- tags: []
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
 
 
 
10
 
 
11
 
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a ๐Ÿค— transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
 
107
- ### Testing Data, Factors & Metrics
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
- ### Results
128
 
129
- [More Information Needed]
 
 
 
 
 
130
 
131
- #### Summary
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
 
 
133
 
 
134
 
135
- ## Model Examination [optional]
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
138
 
139
- [More Information Needed]
 
 
 
 
 
 
140
 
141
- ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
 
 
 
 
 
 
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
 
155
- ### Model Architecture and Objective
 
 
 
 
 
156
 
157
- [More Information Needed]
158
 
159
- ### Compute Infrastructure
160
 
161
- [More Information Needed]
162
 
163
- #### Hardware
164
 
165
- [More Information Needed]
166
 
167
- #### Software
168
 
169
- [More Information Needed]
170
 
171
- ## Citation [optional]
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
174
 
175
- **BibTeX:**
176
 
177
- [More Information Needed]
 
 
 
178
 
179
- **APA:**
180
 
181
- [More Information Needed]
 
 
 
 
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
 
 
186
 
187
- [More Information Needed]
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
192
 
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
196
 
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
200
 
 
201
 
 
 
1
  ---
2
+ extra_gated_heading: Access beomi/Yi-Ko-34B on Hugging Face
3
+ extra_gated_button_content: Submit
4
+ extra_gated_fields:
5
+ I agree to share my name, email address and username: checkbox
6
+ I confirm that I understand this project is for research purposes only, and confirm that I agree to follow the LICENSE of this model: checkbox
7
+ language:
8
+ - en
9
+ - ko
10
+ pipeline_tag: text-generation
11
+ inference: false
12
+ tags:
13
+ - pytorch
14
+ - Yi-Ko
15
+ - 01-ai
16
+ - Yi
17
  library_name: transformers
18
+ license: cc-by-nc-sa-4.0
19
  ---
20
 
21
+ # **beomi/Yi-Ko-34B-Chat-Preview**
22
 
23
+ This repo is naive ChatVector applied using these 3 models:
24
 
25
+ ```python
26
+ ko_model_id = 'beomi/Yi-Ko-34B-dev'
27
+ chat_model = '01-ai/Yi-34B-Chat'
28
+ base_model = '01-ai/Yi-34B'
29
+ ```
30
 
31
+ ## Samples
32
 
33
+ ```python
34
+ messages = [
35
+ {"role": "system", "content": "์นœ์ ˆํ•œ ์ฑ—๋ด‡์œผ๋กœ์„œ ์ƒ๋Œ€๋ฐฉ์˜ ์š”์ฒญ์— ์ตœ๋Œ€ํ•œ ์ž์„ธํ•˜๊ณ  ์นœ์ ˆํ•˜๊ฒŒ ๋‹ตํ•˜์ž. ๋ชจ๋“  ๋Œ€๋‹ต์€ ํ•œ๊ตญ์–ด(Korean)์œผ๋กœ ๋Œ€๋‹ตํ•ด์ค˜."},
36
+ {"role": "user", "content": "ํ•œ๊ตญ์˜ ์ œํ—Œ ํ—Œ๋ฒ•์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”."},
37
+ ]
38
+ ---
39
+ ํ•œ๊ตญ์˜ ํ—Œ๋ฒ•์€ ใ€Œ๋Œ€ํ•œ๋ฏผ๊ตญ ํ—Œ๋ฒ•ใ€์ด๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, ์šฐ๋ฆฌ๋‚˜๋ผ์˜ ๊ธฐ๋ณธ๋ฒ•์ด๋‹ค. ์ด ํ—Œ๋ฒ•์€ 1948๋…„ 7์›” 12์ผ์— ์ œ์ •๋˜๊ณ  8์›” 15์ผ์— ๊ณตํฌ๋˜์—ˆ์œผ๋ฉฐ, ์ดํ›„ ์—ฌ๋Ÿฌ ์ฐจ๋ก€ ๊ฐœ์ •์ด ์žˆ์–ด ์™”๋‹ค. ํ—Œ๋ฒ•์˜ ๋ชฉ์ ์€ ํ•œ๊ตญ์˜ ๊ธฐ๋ณธ์  ๊ฐ€์น˜์™€ ํ†ต์น˜ ์‹œ์Šคํ…œ, ๊ตญ๋ฏผ์˜ ๊ถŒ๋ฆฌ์™€ ์˜๋ฌด๋ฅผ ๊ทœ์ •ํ•˜๊ณ  ์žˆ๋‹ค.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
+ ํ•œ๊ตญ ํ—Œ๋ฒ•์˜ ์ฃผ์š” ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
42
 
43
+ 1. ๊ธฐ๋ณธ๊ถŒ ๋ณด์žฅ: ํ—Œ๋ฒ•์€ ๊ตญ๋ฏผ์˜ ๊ธฐ๋ณธ๊ถŒ์„ ๋ณด์žฅํ•˜๊ณ  ์žˆ๋‹ค. ํ‘œํ˜„์˜ ์ž์œ , ์‹ ์•™์˜ ์ž์œ , ๊ต์œก์˜ ์ž์œ , ํ–‰๋™์˜ ์ž์œ , ๊ฑฐ์ฃผ์ด์ „์˜ ์ž์œ , ์ฒดํฌยท์ˆ˜์‚ฌยท์žฌํŒ์— ๋Œ€ํ•œ ๊ถŒ๋ฆฌ๋ฅผ ๋ณด์žฅํ•˜๊ณ  ์žˆ๋‹ค.
44
 
45
+ 2. ํ‰๋“ฑ: ๋ชจ๋“  ๊ตญ๋ฏผ์€ ๋ฒ• ์•ž์— ํ‰๋“ฑํ•˜๋‹ค๊ณ  ๊ทœ์ •ํ•˜๊ณ  ์žˆ๋‹ค.
46
 
47
+ 3. ๊ตญ๋ฏผํˆฌํ‘œ: ์ค‘์š” ์ •์ฑ…์ด๋‚˜ ๊ตญ๊ฐ€ ์•ˆ์ „์— ๊ด€ํ•œ ์‚ฌํ•ญ์— ๋Œ€ํ•ด์„œ๋Š” ๊ตญ๋ฏผํˆฌํ‘œ๋ฅผ ๊ฑฐ์น  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค.
48
 
49
+ 4. ์‚ฌ๋ฒ•์ฒด๊ณ„: ํ•œ๊ตญ์€ ์‚ผ๊ถŒ๋ถ„๋ฆฝ ์ œ๋„๋ฅผ ์ฑ„ํƒํ•˜๊ณ  ์žˆ์–ด, ํ–‰์ •๋ถ€(์ •๋ถ€), ์ž…๋ฒ•๋ถ€(๊ตญํšŒ), ์‚ฌ๋ฒ•๋ถ€(๋ฒ•์›)๊ฐ€ ์„œ๋กœ์˜ ๊ธฐ๋Šฅ์„ ๊ฒฌ์ œํ•˜๋ฉฐ ์ž‘๋™ํ•œ๋‹ค.
50
 
51
+ 5. ๋Œ€ํ†ต๋ น์ค‘์‹ฌ์ œ: ํ•œ๊ตญ์€ ๋Œ€ํ†ต๋ น์ค‘์‹ฌ์ œ๋กœ, ๋Œ€ํ†ต๋ น์€ ๊ตญ๊ฐ€์˜ ์ˆ˜๋ฐ˜์ด๋ฉฐ ํ—Œ๋ฒ•์ƒ์˜ ๊ถŒํ•œ์„ ๊ฐ€์ง„๋‹ค.
52
 
53
+ 6. ๋ฏผ์ฃผ์ฃผ์˜ ์ ˆ์ฐจ: ์ •๋‹น, ์„ ๊ฑฐ, ๊ทธ๋ฆฌ๊ณ  ๊ตญ๋ฏผ๋Œ€ํ‘œ๊ธฐ๊ด€์ธ ๊ตญํšŒ๋ฅผ ํ†ตํ•ด ๋ฏผ์ฃผ์ฃผ์˜๊ฐ€ ์šด์˜๋œ๋‹ค.
54
 
55
+ 7. ํ†ต์ผ์ •์ฑ…: ํ—Œ๋ฒ•์€ ํ†ต์ผ์— ๋Œ€ํ•œ ๊ตญ๊ฐ€์˜ ๋…ธ๋ ฅ์„ ๊ทœ์ •ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๋ถํ•œ์˜ ์ฃผ๋ฏผ์„ ์šฐ๋ฆฌ์˜ ๋™ํฌ๋กœ ๊ทœ์ •ํ•˜๊ณ  ์ธ๋„์  ์ง€์›์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค.
56
 
57
+ 8. ์‚ฌ๋ฒ•๊ถŒ ๋…๋ฆฝ: ์‚ฌ๋ฒ•๋ถ€๋Š” ํ—Œ๋ฒ•๊ณผ ๋ฒ•๋ฅ ์— ๋”ฐ๋ผ ๋…๋ฆฝ์ ์œผ๋กœ ์žฌํŒ๊ถŒ์„ ํ–‰์‚ฌํ•œ๋‹ค.
58
 
59
+ 9. ๊ฒฝ์ œ์ •์ฑ…: ๊ฒฝ์ œ์งˆ์„œ๋ฅผ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ž์œ ์‹œ์žฅ๊ฒฝ์ œ์— ์˜ํ•ด ์œ ์ง€ํ•˜๋˜, ๊ตญ๊ฐ€๋Š” ๊ท ํ˜• ์žˆ๋Š” ๊ตญ๋ฏผ๊ฒฝ์ œ์˜ ์„ฑ์žฅ๊ณผ ์•ˆ์ • ๋ฐ ๊ณต์ •ํ•œ ๋ถ„๋ฐฐ๋ฅผ ์œ„ํ•ด ๋…ธ๋ ฅํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์ด ๋ช…์‹œ๋˜์–ด ์žˆ๋‹ค.
60
 
61
+ 10. ํ™˜๊ฒฝ๋ณดํ˜ธ: ์ธ๊ฐ„๋‹ค์šด ์ƒํ™œ์„ ์œ ์ง€ํ•˜๊ณ  ์ž์—ฐ๊ณผ ์—ญ์‚ฌ์˜์œ ์‚ฐ์„ ๋ณดํ˜ธํ•˜๊ณ  ๊ด€๋ฆฌํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ๋„ ํ—Œ๋ฒ•์— ๊ธฐ์žฌ๋˜์–ด ์žˆ๋‹ค.
62
 
63
+ ํ•œ๊ตญ์˜ ํ—Œ๋ฒ•์€ ์ด๋Ÿฌํ•œ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ๊ตญ์˜ ์ •์น˜ยท๊ฒฝ์ œยท์‚ฌํšŒ์˜ ๊ธฐ๋ณธ์งˆ์„œ์™€ ๊ตญ๋ฏผ์ƒํ™œ์˜ ๊ธฐ๋ณธ์›๋ฆฌ๋ฅผ ์ •ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๊ตญ๋ฏผ๋“ค์˜ ๊ธฐ๋ณธ๊ถŒ์„ ๋ณดํ˜ธํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ๋‹ค.<|im_end|>
64
+ ์ œ1ํ•ญ ๋Œ€ํ•œ๋ฏผ๊ตญ์€ ๋ฏผ์ฃผ๊ณตํ™”๊ตญ์ด๋‹ค.
65
+ ์ œ3ํ•ญ ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์ฃผ๊ถŒ์€ ๊ตญ๋ฏผ์—๊ฒŒ ์žˆ๊ณ , ๋ชจ๋“  ๊ถŒ๋ ฅ์˜
66
+ ์ œ5ํ•ญ ๊ตญ๋ฏผ์€ ๋ฒ• ์•ž์— ํ‰๋“ฑํ•˜๋‹ค.
67
+ ์ œ6ํ•ญ ๋ชจ๋“  ๊ตญ๋ฏผ์€ ์ธ๊ฐ„๋‹ค์šด ์ƒํ™œ์„ ํ•  ๊ถŒ๋ฆฌ๋ฅผ ๊ฐ€์ง„๋‹ค.<|im_end|>
68
+ ```
69
 
70
+ ```python
71
+ messages = [
72
+ {"role": "system", "content": "์นœ์ ˆํ•œ ์ฑ—๋ด‡์œผ๋กœ์„œ ์ƒ๋Œ€๋ฐฉ์˜ ์š”์ฒญ์— ์ตœ๋Œ€ํ•œ ์ž์„ธํ•˜๊ณ  ์นœ์ ˆํ•˜๊ฒŒ ๋‹ตํ•˜์ž. ๋ชจ๋“  ๋Œ€๋‹ต์€ ํ•œ๊ตญ์–ด(Korean)์œผ๋กœ ๋Œ€๋‹ตํ•ด์ค˜."},
73
+ {"role": "user", "content": "์ด์ˆœ์‹  ์žฅ๊ตฐ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด์ค˜."},
74
+ ]
75
+ ---
76
+ ์ด์ˆœ์‹  ์žฅ๊ตฐ์€ ์กฐ์„  ์‹œ๋Œ€์— ํ™œ์•ฝํ•œ ์œ ๋ช…ํ•œ ์ž„์ง„์™œ๋ž€์˜ ์ „๋žต๊ฐ€์š”. ๊ทธ๊ฐ€ ํ™œ์•ฝํ•œ ์‹œ๊ธฐ๋Š” 16์„ธ๊ธฐ ๋ง์ด์—ˆ์œผ๋ฉฐ, ์ด๋•Œ๋Š” ์ผ๋ณธ์ด ์กฐ์„ ์„ ์นจ๋žตํ–ˆ๋˜ ์ž„์ง„์™œ๋ž€์ด ๋ฐœ๋ฐœํ•œ ๋•Œ์ž…๋‹ˆ๋‹ค. ์ด์ˆœ์‹  ์žฅ๊ตฐ์€ ์ด๋•Œ ์ผ๋ณธ์˜ ํ•ด๊ตฐ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๊ตฐ๋Œ€๊ฐ€ ์กฐ์„ ์„ ๊ณต๊ฒฉํ•˜์ž, ์กฐ์„ ์˜ ํ•ด๊ตฐ์—์„œ ํ™œ์•ฝํ•˜๋ฉฐ ์ผ๋ณธ์„ ์ƒ๋Œ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์œ ๋ช…ํ•œ ํ•ด์ „์ธ 'ํ•œ์‚ฐ๋„๋Œ€์ฒฉ'๊ณผ '๋ช…๋Ÿ‰๋Œ€์ฒฉ'์—์„œ ์ด์ˆœ์‹  ์žฅ๊ตฐ์€ ๋›ฐ์–ด๋‚œ ์ „๋žต๊ณผ ์šฉ๊ธฐ๋กœ ์ผ๋ณธ์„ ์ƒ๋Œ€๋กœ ํฌ๊ฒŒ ์Šน๋ฆฌํ•˜๋Š” ๋“ฑ ์กฐ์„ ์„ ์œ„๊ธฐ์—์„œ ๊ตฌํ•œ ์ค‘์š”ํ•œ ์ธ๋ฌผ๋กœ ํ‰๊ฐ€๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ์˜ ์šฉ๊ธฐ์™€ ์ง€๋žต์€ ํ˜„์žฌ๊นŒ์ง€๋„ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์˜ ๊ท€๊ฐ์œผ๋กœ ์ „ํ•ด์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค<|im_end|>
77
+ ```
78
+
79
+ ```python
80
+ messages = [
81
+ {"role": "system", "content": "์นœ์ ˆํ•œ ์ฑ—๋ด‡์œผ๋กœ์„œ ์ƒ๋Œ€๋ฐฉ์˜ ์š”์ฒญ์— ์ตœ๋Œ€ํ•œ ์ž์„ธํ•˜๊ณ  ์นœ์ ˆํ•˜๊ฒŒ ๋‹ตํ•˜์ž. ๋ชจ๋“  ๋Œ€๋‹ต์€ ํ•œ๊ตญ์–ด(Korean)์œผ๋กœ ๋Œ€๋‹ตํ•ด์ค˜."},
82
+ {"role": "user", "content": "๋…๋„์™€ ๋‹ค์ผ€์‹œ๋งˆ์˜ ์ฐจ์ด๊ฐ€ ๋ญ์•ผ?"},
83
+ ]
84
+ ---
85
+ "๋…๋„"์™€ "๋‹ค์ผ€์‹œ๋งˆ"๋Š” ๋™์ผํ•œ ์„ฌ์„ ๊ฐ€๋ฆฌํ‚ค๋Š” ์ด๋ฆ„์ด์ง€๋งŒ ์„œ๋กœ ๋‹ค๋ฅธ ๋‚˜๋ผ์˜ ์–ธ์–ด๋กœ ๋ถˆ๋ฆฌ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. "๋…๋„"๋Š” ๋Œ€ํ•œ๋ฏผ๊ตญ ์–ด๋กœ "็จๅณถ"๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, "๋‹ค์ผ€์‹œ๋งˆ"๋Š” ์ผ๋ณธ ์–ด๋กœ "็ซนๅณถ"๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. ์ด ์„ฌ์€ ์ผ๋ณธ ๋—ํ† ๋ฆฌํ˜„๊ณผ ์‚ฌ์นด์ด๋ฏธ๋‚˜ํ† ์‹œ์— ์ธ์ ‘ํ•œ ์ผ๋ณธ์— ๊ฐ€๊นŒ์šด ๊ณณ์— ์œ„์น˜ํ•ด ์žˆ์ง€๋งŒ, ๊ตญ์ œ๋ฒ•์ ์œผ๋กœ ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์˜ํ† ๋กœ ์ธ์ •๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
86
 
87
+ ๋Œ€ํ•œ๋ฏผ๊ตญ์€ ์ด ์„ฌ์„ ๋…๋„๋กœ ์นญํ•˜๋ฉฐ ์—ญ์‚ฌ์ ์œผ๋กœ๋ถ€ํ„ฐOwnership๋ฅผ ์ฃผ์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์—ญ์‚ฌ์  ๊ธฐ๋ก์— ๋”ฐ๋ฅด๋ฉด ๋…๋„๋Š” ์กฐ์„ ์‹œ๋Œ€์— ์ด๋ฏธ ํ•œ๊ตญ์˜ ์˜ํ† ๋กœ ์ธ์‹๋˜์—ˆ๊ณ , ์ผ๋ณธ ์ œ๊ตญ์ฃผ์˜ ์‹œ์ ˆ์— ์ž ์‹œ ๊ด€ํ• ๊ถŒ์— ์†ํ–ˆ๋‹ค๊ฐ€ ์ผ๋ณธ์ด ํŒจ์ „ํ•œ ์ดํ›„ ๋Œ€ํ•œ๋ฏผ๊ตญ ์˜ํ† ๋กœ ํ™˜์›๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
88
 
89
+ ์ผ๋ณธ์€ ์ด ์„ฌ์„ ๋‹ค์ผ€์‹œ๋งˆ๋กœ ์นญํ•˜๋ฉฐ ์˜์œ ๊ถŒ์„ ์ฃผ์žฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ์ฃผ์žฅ์€ ๊ตญ์ œ๋ฒ•์ƒ ์ธ์ •๋˜์ง€ ์•Š์œผ๋ฉฐ, ํ•œ๊ตญ ์ •๋ถ€๋Š” ๋…๋„๋ฅผ ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์˜ํ† ๋กœ์„œ ์ ˆ๋Œ€์ ์ธ ๊ถŒ๋ฆฌ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ์„ ๋ถ„๋ช…ํžˆ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
90
 
91
+ ๊ฒฐ๋ก ์ ์œผ๋กœ, "๋…๋„"์™€ "๋‹ค์ผ€์‹œ๋งˆ"๋Š” ๋™์ผํ•œ ์„ฌ์„ ๊ฐ€๋ฆฌํ‚ค์ง€๋งŒ, ์„œ๋กœ ๋‹ค๋ฅธ ๊ตญ๊ฐ€์˜ ์—ญ์‚ฌ์ ์ด๊ณ  ์ •์น˜์  ์ƒํ™ฉ์— ๋”ฐ๋ผ ๋ถ€๋ฅด๋Š” ์ด๋ฆ„์ด ๋‹ค๋ฅด๋ฉฐ, ๊ตญ์ œ๋ฒ•์ƒ ์ด ์„ฌ์€ ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์˜ํ† ๋กœ ๋ถ„๋ฅ˜๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.<|im_end|>
92
 
93
+ ```
94
 
95
+ ```python
96
+ messages = [
97
+ {"role": "system", "content": "์นœ์ ˆํ•œ ์ฑ—๋ด‡์œผ๋กœ์„œ ์ƒ๋Œ€๋ฐฉ์˜ ์š”์ฒญ์— ์ตœ๋Œ€ํ•œ ์ž์„ธํ•˜๊ณ  ์นœ์ ˆํ•˜๊ฒŒ ๋‹ตํ•˜์ž. ๏ฟฝ๏ฟฝ๋“  ๋Œ€๋‹ต์€ ํ•œ๊ตญ์–ด(Korean)์œผ๋กœ ๋Œ€๋‹ตํ•ด์ค˜."},
98
+ {"role": "user", "content": "์„ธ์ข…๋Œ€์™•์˜ ์ง‘ํ˜„์ „ ๋งฅ๋ถ ๋˜์ง ์‚ฌ๊ฑด์— ๋Œ€ํ•ด์„œ ์•Œ๊ณ  ์žˆ์–ด?"},
99
+ ]
100
+ ---
101
+ ์„ธ์ข… ๋Œ€์™•์˜ ์ง‘ํ˜„์ „ ๋งฅ๋ถ ๋˜์ง ์‚ฌ๊ฑด์— ๋Œ€ํ•ด ๋“ค์–ด๋ณธ ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์‚ฌ๊ฑด์€ 2017๋…„ ํ•œ ๋ฐฉ์†ก์‚ฌ์˜ ๋“œ๋ผ๋งˆ์—์„œ ๋“ฑ์žฅํ•˜๋ฉฐ ๋„๋ฆฌ ์•Œ๋ ค์กŒ๋Š”๋ฐ์š”, ๋“œ๋ผ๋งˆ์—์„œ๋Š” ์ง‘ํ˜„์ „ ํ•™์ž๋“ค์ด ์ƒˆ๋กœ์šด ๋ฌธ๋ฌผ์ธ ๋งฅ๋ถ์„ ๋ณด๊ณ  ์„ธ์ข… ๋Œ€์™•์ด ์ด๋ฅผ ๋˜์ง„ ๊ฒƒ์œผ๋กœ ๋ฌ˜์‚ฌ๋˜์–ด ๋…ผ๋ž€์ด ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
102
 
103
+ ํ•˜์ง€๋งŒ ์ด ๋‚ด์šฉ์€ ์—ญ์‚ฌ์  ์‚ฌ์‹ค๊ณผ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์ด ๋ฐํ˜€์กŒ์Šต๋‹ˆ๋‹ค. ์ง‘ํ˜„์ „ ํ•™์ž๋“ค์ด ์‹ค์ œ๋กœ ๋งฅ๋ถ๊ณผ ๊ฐ™์€ ์™ธ๊ณ„ ๋ฌธ๋ฌผ์„ ๋ชฉ๊ฒฉํ–ˆ์„ ๋ฆฌ ์—†๊ณ , ์„ธ์ข… ๋Œ€์™•์ด ํ•™์ž๋“ค์˜ ๋…ธ๋ ฅ์„ ํ„ํ•˜ํ•˜๊ฑฐ๋‚˜ ๋ฌผ๊ฑด์„ ๋˜์ง€๋Š” ํ–‰๋™์„ ํ–ˆ์„ ๊ฒƒ์ด๋ผ๋Š” ์ „์ œ์— ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์˜คํžˆ๋ ค ์„ธ์ข… ๋Œ€์™•์€ ์ˆ™๋ช…ํ•™์„ ์„ค๋ฆฝํ•˜๊ณ  ํ•œ๊ธ€์„ ์ฐฝ์ œํ•˜๋Š” ๋“ฑ ํ•™๋ฌธ๊ณผ ๋ฌธํ™”์— ๋Œ€ํ•œ ์ง€์›์„ ์•„๊นŒ์ง€ ์•Š์•˜๋˜ ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.
104
 
105
+ ๋“œ๋ผ๋งˆ ์†์— ๋“ฑ์žฅํ•œ ์‚ฌ๊ฑด์€ ํ—ˆ๊ตฌ์ ์ธ ์„ค์ •์ผ๋ฟ, ์‹ค์ œ ์—ญ์‚ฌ์—์„œ ์ผ์–ด๋‚œ ์ผ์€ ์•„๋‹™๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์˜ค๋ฅ˜๋ฅผ ๋ฐ˜์˜ํ•˜๋“ฏ, ์ดํ›„์˜ ๋ฐฉ์†ก์—์„œ๋Š” ํ•ด๋‹น ์žฅ๋ฉด์ด ์ˆ˜์ •๋˜๊ฑฐ๋‚˜ ์‚ญ์ œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.<|im_end|>
106
+ ```
107
 
108
+ ```python
109
+ messages = [
110
+ {"role": "system", "content": "์นœ์ ˆํ•œ ์ฑ—๋ด‡์œผ๋กœ์„œ ์ƒ๋Œ€๋ฐฉ์˜ ์š”์ฒญ์— ์ตœ๋Œ€ํ•œ ์ž์„ธํ•˜๊ณ  ์นœ์ ˆํ•˜๊ฒŒ ๋‹ตํ•˜์ž. ๋ชจ๋“  ๋Œ€๋‹ต์€ ํ•œ๊ตญ์–ด(Korean)์œผ๋กœ ๋Œ€๋‹ตํ•ด์ค˜."},
111
+ {"role": "user", "content": "๋‚˜๋Š” ์˜ค๋Š˜ ์‚ฌ๊ณผ 5๊ฐœ๋ฅผ ๊ฐ–๊ณ ์žˆ์–ด. ๊ทธ๋ฆฌ๊ณ  ์–ด์ œ ๋‚˜๋Š” ์‚ฌ๊ณผ 2๊ฐœ๋ฅผ ๋จน์—ˆ์ง€. ๋‚ด๊ฐ€ ์ง€๊ธˆ ๊ฐ€์ง„ ์‚ฌ๊ณผ๋Š” ๋ช‡ ๊ฐœ์ง€?"},
112
+ ]
113
+ ---
114
+ ์ง€๋‚œ ๋ฒˆ์—๋Š” ์‚ฌ๊ณผ 2๊ฐœ๋ฅผ ๋จน์—ˆ๊ณ , ์ง€๊ธˆ์€ ์‚ฌ๊ณผ 5๊ฐœ๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค๊ณ  ํ–ˆ์œผ๋‹ˆ, ์ง€๊ธˆ ๊ฐ€์ง„ ์‚ฌ๊ณผ์˜ ๊ฐœ์ˆ˜๋Š” 5๊ฐœ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, ์‚ฌ๊ณผ 2๊ฐœ๋ฅผ ๋จน์–ด์„œ ์ด ์‚ฌ๊ณผ๋ฅผ 5๊ฐœ์—์„œ 2๊ฐœ ๋นผ๋ฉด, ๋‚จ์€ ์‚ฌ๊ณผ์˜ ๊ฐœ์ˆ˜๋Š” 5 - 2 = 3๊ฐœ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‹ˆ๊นŒ, ์ง€๊ธˆ ๊ฐ€์ง„ ์‚ฌ๊ณผ๋Š” 3๊ฐœ์ž…๋‹ˆ๋‹ค.<|im_end|>
115
+ ```
116
 
 
 
 
 
 
117
 
118
+ # beomi/Yi-Ko-34B-Preview
119
 
120
+ Yi-Ko series models serve as advanced iterations of 01-ai/Yi models,
121
+ benefiting from an expanded vocabulary and the inclusion of Korean/English corpus in its further pretraining.
122
+ Just like its predecessor, Yi-Ko series models operate within the broad range of generative text models that stretch from 6 billion to 34 billion parameters.
123
+ This repository focuses on the **34B** pretrained version,
124
+ which is tailored to fit the Hugging Face Transformers format.
125
+ For access to the other models, feel free to consult the index provided below.
126
 
127
+ ## Model Details
128
 
129
+ **Model Developers** Junbum Lee (Beomi)
130
 
131
+ **Variations** Yi-Ko-34B will come in a range of parameter sizes โ€” 6B and 34B โ€” with Ko(Korean Only)/KoEn(Korean+English)/KoEnCo(Korean+English+Code) variations.
132
 
133
+ **Input** Models input text only.
134
 
135
+ **Output** Models generate text only.
136
 
137
+ **Model Architecture**
138
 
139
+ Yi-Ko series models are an auto-regressive language model that uses an optimized transformer architecture based on Llama-2*.
140
 
141
+ <small>*Yi model architecture is based on Llama2, so it can be loaded via `LlamaForCausalLM` class on HF.</small>
142
 
143
+ |Model Name|Training Data|Params|Context Length|GQA|Trained Tokens|LR|Batch Size(per step)|
144
+ |---|---|---|---|---|---|---|---|
145
+ |Yi-Ko-34B|*A mix of Korean + English online data*|34B|4k|O|TBD(under training)|5e<sup>-5</sup>|2048|
146
 
147
+ **Vocab Expansion**
148
 
149
+ | Model Name | Vocabulary Size | Description |
150
+ | --- | --- | --- |
151
+ | Original Yi-Series | 64000 | Sentencepiece BPE |
152
+ | **Expanded Yi-Ko Series** | 78464 | Sentencepiece BPE. Added Korean vocab and merges |
153
 
154
+ **Tokenizing "์•ˆ๋…•ํ•˜์„ธ์š”, ์˜ค๋Š˜์€ ๋‚ ์”จ๊ฐ€ ์ข‹๋„ค์š”.ใ…Žใ…Ž"**
155
 
156
+ | Model | # of tokens | Tokens |
157
+ | --- | --- | --- |
158
+ | Original Yi-Series | 47 | `['<0xEC>', '<0x95>', '<0x88>', '<0xEB>', '<0x85>', '<0x95>', 'ํ•˜', '<0xEC>', '<0x84>', '<0xB8>', '<0xEC>', '<0x9A>', '<0x94>', ',', 'โ–', '<0xEC>', '<0x98>', '<0xA4>', '<0xEB>', '<0x8A>', '<0x98>', '์€', 'โ–', '<0xEB>', '<0x82>', '<0xA0>', '<0xEC>', '<0x94>', '<0xA8>', '๊ฐ€', 'โ–', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '<0xEC>', '<0x9A>', '<0x94>', '.', '<0xE3>', '<0x85>', '<0x8E>', '<0xE3>', '<0x85>', '<0x8E>']` |
159
+ | **Expanded Yi-Ko Series** | 10 | `['โ–์•ˆ๋…•', 'ํ•˜์„ธ์š”', ',', 'โ–์˜ค๋Š˜์€', 'โ–๋‚ ', '์”จ๊ฐ€', 'โ–์ข‹๋„ค์š”', '.', 'ใ…Ž', 'ใ…Ž']` |
160
+ |<small>*Equal Korean vocab with Llama-2-Ko Series</small>||
161
 
162
+ **Tokenizing "Llama 2: Open Foundation and Fine-Tuned Chat Models"**
163
 
164
+ | Model | # of tokens | Tokens |
165
+ | --- | --- | --- |
166
+ | Original Yi-Series | 21 | `['The', 'โ–Y', 'i', 'โ–series', 'โ–models', 'โ–are', 'โ–large', 'โ–language', 'โ–models', 'โ–trained', 'โ–from', 'โ–scratch', 'โ–by', 'โ–developers', 'โ–at', 'โ–', '0', '1', '.', 'AI', '.']` |
167
+ | **Expanded Yi-Ko Series** | 21 | `['โ–The', 'โ–Y', 'i', 'โ–series', 'โ–models', 'โ–are', 'โ–large', 'โ–language', 'โ–models', 'โ–trained', 'โ–from', 'โ–scratch', 'โ–by', 'โ–developers', 'โ–at', 'โ–', '0', '1', '.', 'AI', '.']` |
168
+ |<small>*Equal Korean vocab with Llama-2-Ko Series</small>| | <small>*Since **Expanded Yi-Ko Series** prepends `_` at the beginning of the text(to ensure same tokenization for Korean sentences), it shows negilible difference for the first token on English tokenization. </small>|
169
 
170
+ # **Model Benchmark**
171
 
172
+ ## LM Eval Harness - Korean (polyglot branch)
173
 
174
+ TBD
175
 
176
+ ## LICENSE
177
 
178
+ CC-BY-NC-SA-4.0 under Yi license
179
 
180
+ ## Citation
181
 
182
+ TBD
183
 
184
+ ## Acknowledgement
185
 
186
+ The training is supported by [TPU Research Cloud](https://sites.research.google/trc/) program.