brildev7 commited on
Commit
3d5441e
โ€ข
1 Parent(s): 047f7c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -198
README.md CHANGED
@@ -9,209 +9,66 @@ tags:
9
  ---
10
 
11
  # Model Card for Model ID
12
-
13
- <!-- Provide a quick summary of what the model is/does. -->
14
-
15
-
16
-
17
  ## Model Details
18
-
19
  ### Model Description
20
-
21
- <!-- Provide a longer summary of what this model is. -->
22
-
23
-
24
-
25
  - **Developed by:** [Kang Seok Ju]
26
- - **Funded by [optional]:** [More Information Needed]
27
- - **Shared by [optional]:** [More Information Needed]
28
- - **Model type:** [More Information Needed]
29
- - **Language(s) (NLP):** [More Information Needed]
30
- - **License:** [More Information Needed]
31
- - **Finetuned from model [optional]:** [More Information Needed]
32
-
33
- ### Model Sources [optional]
34
-
35
- <!-- Provide the basic links for the model. -->
36
-
37
- - **Repository:** [More Information Needed]
38
- - **Paper [optional]:** [More Information Needed]
39
- - **Demo [optional]:** [More Information Needed]
40
-
41
- ## Uses
42
-
43
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
-
45
- ### Direct Use
46
-
47
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
-
49
- [More Information Needed]
50
-
51
- ### Downstream Use [optional]
52
-
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
-
55
- [More Information Needed]
56
-
57
- ### Out-of-Scope Use
58
-
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
-
61
- [More Information Needed]
62
-
63
- ## Bias, Risks, and Limitations
64
-
65
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
-
67
- [More Information Needed]
68
-
69
- ### Recommendations
70
-
71
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
-
73
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
-
75
- ## How to Get Started with the Model
76
-
77
- Use the code below to get started with the model.
78
-
79
- '''
80
- import os
81
- from dataclasses import dataclass, field
82
- from typing import Optional
83
-
84
- import torch
85
-
86
- from transformers import AutoTokenizer, HfArgumentParser, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
87
- from datasets import load_dataset
88
- from peft import LoraConfig, PeftModel
89
- from transformers import BitsAndBytesConfig
90
- '''
91
 
92
  ## Training Details
93
-
94
  ### Training Data
95
  https://huggingface.co/datasets/raki-1203/ai_hub_summarization
96
 
97
- ### Training Procedure
98
-
99
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
100
-
101
- #### Preprocessing [optional]
102
-
103
- [More Information Needed]
104
-
105
-
106
- #### Training Hyperparameters
107
-
108
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
109
-
110
- #### Speeds, Sizes, Times [optional]
111
-
112
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
113
-
114
- [More Information Needed]
115
-
116
- ## Evaluation
117
-
118
- <!-- This section describes the evaluation protocols and provides the results. -->
119
-
120
- ### Testing Data, Factors & Metrics
121
-
122
- #### Testing Data
123
-
124
- <!-- This should link to a Dataset Card if possible. -->
125
-
126
- [More Information Needed]
127
-
128
- #### Factors
129
-
130
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
131
-
132
- [More Information Needed]
133
-
134
- #### Metrics
135
-
136
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
137
-
138
- [More Information Needed]
139
-
140
- ### Results
141
-
142
- [More Information Needed]
143
-
144
- #### Summary
145
-
146
-
147
-
148
- ## Model Examination [optional]
149
-
150
- <!-- Relevant interpretability work for the model goes here -->
151
-
152
- [More Information Needed]
153
-
154
- ## Environmental Impact
155
-
156
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
157
-
158
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
159
-
160
- - **Hardware Type:** [More Information Needed]
161
- - **Hours used:** [More Information Needed]
162
- - **Cloud Provider:** [More Information Needed]
163
- - **Compute Region:** [More Information Needed]
164
- - **Carbon Emitted:** [More Information Needed]
165
-
166
- ## Technical Specifications [optional]
167
-
168
- ### Model Architecture and Objective
169
-
170
- [More Information Needed]
171
-
172
- ### Compute Infrastructure
173
-
174
- [More Information Needed]
175
-
176
- #### Hardware
177
-
178
- [More Information Needed]
179
-
180
- #### Software
181
-
182
- [More Information Needed]
183
-
184
- ## Citation [optional]
185
-
186
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
187
-
188
- **BibTeX:**
189
-
190
- [More Information Needed]
191
-
192
- **APA:**
193
-
194
- [More Information Needed]
195
-
196
- ## Glossary [optional]
197
-
198
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
199
-
200
- [More Information Needed]
201
-
202
- ## More Information [optional]
203
-
204
- [More Information Needed]
205
-
206
- ## Model Card Authors [optional]
207
-
208
- [More Information Needed]
209
-
210
- ## Model Card Contact
211
-
212
- [More Information Needed]
213
-
214
-
215
- ### Framework versions
216
-
217
- - PEFT 0.8.2
 
9
  ---
10
 
11
  # Model Card for Model ID
 
 
 
 
 
12
  ## Model Details
 
13
  ### Model Description
14
+ Summarise Korean sentences concisely
 
 
 
 
15
  - **Developed by:** [Kang Seok Ju]
16
+ - **Contact:** [brildev7@gmail.com]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Training Details
 
19
  ### Training Data
20
  https://huggingface.co/datasets/raki-1203/ai_hub_summarization
21
 
22
+ # Inference Examples
23
+ ```
24
+ import os
25
+ import torch
26
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
27
+ from peft import PeftModel
28
+
29
+ model_id = "google/gemma-7b"
30
+ peft_model_id = "brildev7/gemma_7b_summarization_ko_sft_qlora"
31
+ quantization_config = BitsAndBytesConfig(
32
+ load_in_4bit=True,
33
+ bnb_4bit_compute_dtype=torch.float32,
34
+ bnb_4bit_quant_type="nf4"
35
+ )
36
+
37
+ model = AutoModelForCausalLM.from_pretrained(model_id,
38
+ quantization_config=quantization_config,
39
+ torch_dtype=torch.float32,
40
+ low_cpu_mem_usage=True,
41
+ attn_implementation="sdpa",
42
+ device_map="auto")
43
+ model = PeftModel.from_pretrained(model, peft_model_id)
44
+
45
+ tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
46
+ tokenizer.pad_token_id = tokenizer.eos_token_id
47
+
48
+ # example
49
+ prompt_template = "๋‹ค์Œ ๊ธ€์„ ์š”์•ฝํ•˜์„ธ์š”.:{}\n์š”์•ฝ:"
50
+ passage = "๊ธฐํš์žฌ์ •๋ถ€๋Š” 20์ผ ์ด ๊ฐ™์€ ๋‚ด์šฉ์˜ '์ฃผ๋ฅ˜ ๋ฉดํ—ˆ ๋“ฑ์— ๊ด€ํ•œ ๋ฒ•๋ฅ  ์‹œํ–‰๋ น' ๊ฐœ์ •์•ˆ์„ ์ž…๋ฒ• ์˜ˆ๊ณ ํ–ˆ๋‹ค. ๊ฐœ์ •์•ˆ์—๋Š” ์ฃผ๋ฅ˜ ํŒ๋งค์—… ๋ฉดํ—ˆ ์ทจ์†Œ์˜ ์˜ˆ์™ธ์— ํ•ด๋‹นํ•˜๋Š” ์ฃผ๋ฅ˜์˜ ๋‹จ์ˆœ๊ฐ€๊ณตยท์กฐ์ž‘์˜ ๋ฒ”์œ„๋ฅผ ์ˆ ์ž” ๋“ฑ ๋นˆ ์šฉ๊ธฐ์— ์ฃผ๋ฅ˜๋ฅผ ๋‚˜๋ˆ  ๋‹ด์•„ ํŒ๋งคํ•˜๋Š” ๊ฒฝ์šฐ ๋“ฑ์ด ํฌํ•จ๋๋‹ค. ์‹๋‹นยท์ฃผ์  ๋“ฑ์—์„œ ์ฃผ๋ฅ˜๋ฅผ ํŒ๋งคํ•  ๋•Œ ์ˆ ์„ ์ž”์— ๋‚˜๋ˆ  ํŒ๋งคํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์˜๋ฏธ๋‹ค. ์ข…ํ•ฉ์ฃผ๋ฅ˜๋„๋งค์—…์ž๊ฐ€ ์ฃผ๋ฅ˜์ œ์กฐ์ž ๋“ฑ์ด ์ œ์กฐยทํŒ๋งคํ•˜๋Š” ๋น„์•Œ์ฝ”์˜ฌ ์Œ๋ฃŒ ๋˜๋Š” ๋ฌด์•Œ์ฝ”์˜ฌ ์Œ๋ฃŒ๋ฅผ ์ฃผ๋ฅ˜์™€ ํ•จ๊ป˜ ์Œ์‹์  ๋“ฑ์— ๊ณต๊ธ‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ฃผ๋ฅ˜ํŒ๋งค ์ „์—…์˜๋ฌด ๋ฉดํ—ˆ์š”๊ฑด๋„ ์™„ํ™”ํ–ˆ๋‹ค. ํ˜„์žฌ ์•Œ์ฝ”์˜ฌ ๋„์ˆ˜๊ฐ€ 0%์ธ ์Œ๋ฃŒ๋Š” '๋ฌด์•Œ์ฝ”์˜ฌ ์Œ๋ฃŒ'๋กœ, 0% ์ด์ƒ 1% ๋ฏธ๋งŒ์ธ ๊ฒƒ์€ '๋น„์•Œ์ฝ”์˜ฌ ์Œ๋ฃŒ'๋กœ ๊ตฌ๋ถ„๋œ๋‹ค. ํ˜„ํ–‰ ๊ทœ์ •์ƒ ๋ฌด์•Œ์ฝ”์˜ฌยท๋น„์•Œ์ฝ”์˜ฌ ์ฃผ๋ฅ˜๋Š” ์ฃผ๋ฅ˜ ์—…์ž๊ฐ€ ์œ ํ†ตํ•  ์ˆ˜ ์—†๋Š”๋ฐ ์ด ๊ทœ์ •์„ ์™„ํ™”ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ๊ธฐ์žฌ๋ถ€๋Š” ๋‹ค์Œ ๋‹ฌ 29์ผ๊นŒ์ง€ ์˜๊ฒฌ ์ˆ˜๋ ด์„ ๊ฑฐ์ณ ์ด๋ฅด๋ฉด ๋‹ค์Œ ๋‹ฌ ๋ง๋ถ€ํ„ฐ ์‹œํ–‰ํ•  ์˜ˆ์ •์ด๋‹ค๏ผŽ"
51
+ prompt = prompt_template.format(passage)
52
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
53
+ outputs = model.generate(**inputs,
54
+ max_new_tokens=512,
55
+ temperature=0.2,
56
+ top_p=0.95,
57
+ do_sample=True,
58
+ use_cache=False)
59
+ print(tokenizer.decode(outputs[0]))
60
+ - ๊ธฐํš์žฌ์ •๋ถ€๋Š” 20์ผ ์ฃผ๋ฅ˜ ํŒ๋งค์—… ๋ฉดํ—ˆ ์ทจ์†Œ์˜ ์˜ˆ์™ธ์— ํ•ด๋‹นํ•˜๋Š” ์ฃผ๋ฅ˜์˜ ๋‹จ์ˆœ๊ฐ€๊ณตยท์กฐ์ž‘์˜ ๋ฒ”์œ„๋ฅผ ์ˆ ์ž” ๋“ฑ ๋นˆ ์šฉ๊ธฐ์— ์ฃผ๋ฅ˜๋ฅผ ๋‚˜๋ˆ  ๋‹ด์•„ ํŒ๋งคํ•˜๋Š” ๊ฒฝ์šฐ ๋“ฑ์ด ํฌํ•จ๋œ '์ฃผ๋ฅ˜ ๋ฉดํ—ˆ ๋“ฑ์— ๊ด€ํ•œ ๋ฒ•๋ฅ  ์‹œํ–‰๋ น' ๊ฐœ์ •์•ˆ์„ ์ž…๋ฒ• ์˜ˆ๊ณ ํ–ˆ๋‹ค.
61
+
62
+ # example
63
+ prompt_template = "๋‹ค์Œ ๊ธ€์„ ์š”์•ฝํ•˜์„ธ์š”.:{}\n์š”์•ฝ:"
64
+ passage = "์ง€๋‚œ 1์›” ์ผ๋ณธ ์˜ค์‚ฌ์นด ์šฐ๋ฉ”๋‹ค์˜ ๋ทฐํ‹ฐ์ƒต โ€˜์•ณ์ฝ”์Šค๋ฉ”โ€™์—์„œ ์ง„ํ–‰๋œ CJ์˜ฌ๋ฆฌ๋ธŒ์˜์˜ ๋ฉ”์ดํฌ์—… ๋ธŒ๋žœ๋“œ(PB) โ€˜๋ฐ”์ด์˜คํž ๋ณดโ€™์˜ ํŒ์—… ์Šคํ† ์–ด ํ˜„์žฅ. ์˜ค์‚ฌ์นด ์ตœ๋Œ€ ๊ทœ๋ชจ๋ฅผ ์ž๋ž‘ํ•˜๋Š” ์•ณ์ฝ”์Šค๋ฉ” ๋งค์žฅ ํ•œ ๊ฐ€์šด๋ฐ ๊พธ๋ฉฐ์ง„ ํŒ์—… ์Šคํ† ์–ด์—๋Š” ํ•œ๊ตญ์—์„œ ์ธ๊ธฐ ๋†’์€ ํ™”์žฅํ’ˆ์„ ์‹ค์ œ๋กœ ๊ฒฝํ—˜ํ•ด๋ณด๋ ค๋Š” ๊ณ ๊ฐ๋“ค๋กœ ๋ฐœ ๋””๋”œ ํ‹ˆ ์—†์ด ๋ถ์ ๊ฑฐ๋ ธ๋‹ค. ํƒ€์ด์™„ ๊ตญ์ ์ž์ด์ง€๋งŒ ์˜ค์‚ฌ์นด์—์„œ ๊ฑฐ์ฃผํ•˜๊ณ  ์žˆ๋‹ค๋Š” 32์‚ด ์ฟ ์ด์ž‰์”จ๋Š” ์ด๋‚  ํŒ์—… ์Šคํ† ์–ด๋ฅผ ์ฐพ์•„ ๋ฐ”์ด์˜คํž ๋ณด์˜ โ€˜ํƒ„ํƒ„ํฌ๋ฆผโ€™์„ ๊ตฌ๋งคํ–ˆ๋‹ค. ์‚ฌํšŒ๊ด€๊ณ„๋ง์„œ๋น„์Šค(SNS)์™€ ์œ ํŠœ๋ธŒ๋ฅผ ํ†ตํ•ด ํ•œ๊ตญ ํ™”์žฅํ’ˆ์ด ์ข‹๋‹ค๋Š” ํ‰์„ ๋“ค์–ด๋ณธ ํ„ฐ๋ผ ์ด๋ฒˆ ๊ธฐํšŒ์— ๊ตฌ๋งคํ•ด ์‚ฌ์šฉํ•ด๋ณด๊ธฐ๋กœ ๊ฒฐ์‹ฌํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค. ์ฟ ์ด์ž‰์”จ๋Š” ํ•œ๊ตญ ํ™”์žฅํ’ˆ์„ ์“ฐ๋ฉด ํ•œ๊ตญ ์—ฌ์„ฑ์ฒ˜๋Ÿผ ์˜ˆ๋ป์ง€์ง€ ์•Š์„๊นŒ ๊ธฐ๋Œ€๊ฐ€ ๋œ๋‹ค๊ณ  ๋งํ–ˆ๋‹ค. ์ด๋‚  ์•ณ์ฝ”์Šค๋ฉ”๋Š” ๋ฐ”์ด์˜คํž ๋ณด ํŒ์—… ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ˆˆ์— ์ž˜ ๋„๋Š” ๋ฉ”์ธ ์ง„์—ด๋Œ€ ์ƒ๋‹น์ˆ˜๊ฐ€ ํ•œ๊ตญ ๋ธŒ๋žœ๋“œ ์ฐจ์ง€์˜€๋‹ค. ๋Œ€๋ถ€๋ถ„ ํ•œ๊ตญ์—์„œ๋„ ์ธ๊ธฐ๊ฐ€ ๋†’์€ ๋ธŒ๋žœ๋“œ๋“ค๋กœ, ์ž…๊ตฌ์—์„œ ๋ฐ”๋กœ ๋ณด์ด๋Š” ์ง„์—ด๋Œ€์—๋Š” โ€˜์›จ์ดํฌ๋ฉ”์ดํฌโ€™์™€ โ€˜ํ”ผ์น˜์”จโ€™, โ€˜์–ด๋ฎค์ฆˆโ€™๊ฐ€, ํ•ด์™ธ ๋ช…ํ’ˆ ๋ธŒ๋žœ๋“œ ์กด ์ •์ค‘์•™์—๋Š” โ€˜ํ—ค๋ผโ€™๊ฐ€ ์ž๋ฆฌํ•˜๊ณ  ์žˆ์—ˆ๋‹ค. ์ผ๋ณธ ๋‚ด K๋ทฐํ‹ฐ์˜ ์ธ๊ธฐ๊ฐ€ ์˜ˆ์‚ฌ๋กญ์ง€ ์•Š๋‹ค. โ€˜์ œ 3์ฐจ ํ•œ๋ฅ˜๋ถโ€™์ด๋ผ๊ณ ๊นŒ์ง€ ์ผ์ปฌ์–ด์ง€๋Š” ํ•œ๋ฅ˜์—ดํ’์„ ํƒ€๊ณ  ์ผ๋ณธ ๋‚ด K๋ทฐํ‹ฐ์˜ ์ž…์ง€๊ฐ€ ๋‚˜๋‚ ์ด ์น˜์†Ÿ๊ณ  ์žˆ๋‹ค. ๊ณผ๊ฑฐ์—๋Š” ์ผ๋ณธ ๋‚ด์—์„œ ํ•œ๊ตญ ๋ฌธํ™”๋ฅผ ์ข‹์•„ํ•˜๋Š” ์ผ๋ถ€ ์†Œ๋น„์ž๋“ค ์‚ฌ์ด์—์„œ๋งŒ ์œ ํ–‰ํ•˜๋Š” ์ˆ˜์ค€์ด์—ˆ๋‹ค๋ฉด, ์ง€๊ธˆ์€ ์ผ๋ณธ ๋ทฐํ‹ฐ ์‹œ์žฅ์— ํ•˜๋‚˜์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ K๋ทฐํ‹ฐ๊ฐ€ ์ž๋ฆฌ๋ฅผ ์žก์•˜๋‹ค๋Š” ํ‰๊ฐ€๋‹ค. 21์ผ ๋ฒ ์ธ์•ค๋“œ์ปดํผ๋‹ˆ์™€ ์œ ๋กœ๋ชจ๋‹ˆํ„ฐ์— ๋”ฐ๋ฅด๋ฉด K๋ทฐํ‹ฐ์˜ ์ผ๋ณธ ์ง€์—ญ๋ณ„ ์นจํˆฌ์œจ(ํŠน์ • ๊ธฐ๊ฐ„ ๋™์•ˆ ํŠน์ • ์ƒํ’ˆ ์†Œ๋น„ ๊ทœ๋ชจ ๋น„์ค‘)์€ 2017๋…„ 1%์—์„œ 2022๋…„ 4.9%๋กœ 5๋…„ ๋งŒ์— 5๋ฐฐ๊ฐ€ ์ฆ๊ฐ€ํ–ˆ๋‹ค. ์ตœ๊ทผ 3๋…„๊ฐ„ ์—ฐํ‰๊ท  ์„ฑ์žฅ๋ฅ ์€ 20%๊ฐ€ ๋„˜๋Š”๋‹ค. ์ง€๋‚œํ•ด์—๋Š” ์ผ๋ณธ ์ˆ˜์ž… ํ™”์žฅํ’ˆ ๊ตญ๊ฐ€๋ณ„ ๋น„์ค‘์—์„œ ํ•œ๊ตญ์ด ์ฒ˜์Œ์œผ๋กœ ํ”„๋ž‘์Šค๋ฅผ ์ œ์น˜๊ณ  1์œ„์— ์˜ค๋ฅด๊ธฐ๋„ ํ–ˆ๋‹ค. ์„œํšจ์ฃผ ๋ฒ ์ธ์•ค๋“œ์ปดํผ๋‹ˆ ํŒŒํŠธ๋„ˆ๋Š” ์ง€๊ธˆ๋ณด๋‹ค 3~4๋ฐฐ ์ด์ƒ ์„ฑ์žฅํ•  ์—ฌ๋ ฅ์ด ์ถฉ๋ถ„ํ•˜๋‹ค๊ณ  ๋งํ–ˆ๋‹ค. ์ผ๋ณธ ์—ฌ์„ฑ๋“ค์ด K๋ทฐํ‹ฐ์— ๋งค๋ฃŒ๋œ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ. ๊ฐ€์žฅ ํฐ ์ด์œ ๋กœ๋Š” โ€˜๋†’์€ ๊ฐ€์„ฑ๋น„(๊ฐ€๊ฒฉ ๋Œ€๋น„ ์„ฑ๋Šฅ)โ€™๊ฐ€ ๊ผฝํžŒ๋‹ค. ์—…๊ณ„์— ๋”ฐ๋ฅด๋ฉด ์‹ค์ œ ์ผ๋ณธ์—์„œ ๋งŽ์ด ํŒ๋งค๋˜๋Š” ํ•œ๊ตญ ํ™”์žฅํ’ˆ ๋ธŒ๋žœ๋“œ์˜ ๊ธฐ์ดˆ์ œํ’ˆ๋“ค์€ ์ผ๋ณธ ๋ธŒ๋žœ๋“œ์— ๋น„ํ•ด ์ œํ’ˆ ๊ฐ€๊ฒฉ์ด 10~20% ๊ฐ€๋Ÿ‰ ์ €๋ ดํ•œ ํŽธ์ด๋‹ค. ์ด๋Š” ํ•œ๊ตญ์ฝœ๋งˆ์™€ ์ฝ”์Šค๋งฅ์Šค ๊ฐ™์€ ๊ตญ๋‚ด ํ™”์žฅํ’ˆ OEM(์ฃผ๋ฌธ์ž ์ƒํ‘œ ๋ถ€์ฐฉ ์ƒ์‚ฐ)ยทODM(์ฃผ๋ฌธ์ž ๊ฐœ๋ฐœ์ƒ์‚ฐ) ์ œ์กฐ์‚ฌ๋“ค์˜ ์„ฑ์žฅ ๋•์ด ํฌ๋‹ค. ์ด๋“ค์˜ ๊ธฐ์ˆ ๋ ฅ์€ ์„ธ๊ณ„ ์ตœ๊ณ  ์ˆ˜์ค€์œผ๋กœ, ์„ธ๊ณ„ ์ตœ๋Œ€ ํ™”์žฅํ’ˆ ๊ธฐ์—…์ธ ๋กœ๋ ˆ์•Œ๋„ ๊ณ ๊ฐ์‚ฌ์ผ ์ •๋„๋‹ค. ์ด๋“ค์€ ๋‹จ์ˆœ ์ œํ’ˆ ์ œ์กฐ๋ฅผ ๋„˜์–ด ์‹ ์ œํ’ˆ์„ ๊ฐœ๋ฐœํ•ด ๋ธŒ๋žœ๋“œ์— ๋จผ์ € ์ œ์•ˆํ•˜๊ณ  ๋˜ ํ•„์š”์‹œ ๋งˆ์ผ€ํŒ…๊นŒ์ง€ ์ง€์›ํ•ด ๋ธŒ๋žœ๋“œ๋ฅผ ํ‚ค์šฐ๋Š” ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ๋‹ค. ํ•œ๊ตญ ๋ทฐํ‹ฐ ๋ธŒ๋žœ๋“œ ๋Œ€๋ถ€๋ถ„์ด ์ด๋“ค์„ ํ†ตํ•ด ์ œํ’ˆ์„ ๋งŒ๋“ค๊ณ  ์žˆ์–ด ์ค‘์†Œ ๊ทœ๋ชจ K๋ทฐํ‹ฐ ๋ธŒ๋žœ๋“œ๋„ ํ’ˆ์งˆ์ด ๋ณด์žฅ๋œ๋‹ค๋Š” ์–˜๊ธฐ๋‹ค. ๋˜ K๋ทฐํ‹ฐ ์ œํ’ˆ์˜ ๊ฐ•์ ์œผ๋กœ๋Š” โ–ณ๋…ํŠนํ•˜๊ณ  ํŠธ๋ Œ๋””ํ•œ ์ปจ์…‰ โ–ณ๋ฐœ๋น ๋ฅธ ์‹ ์ œํ’ˆ ์ถœ์‹œ โ–ณ์˜ˆ์œ ํŒจํ‚ค์ง€ ๋“ฑ์ด ๊ฑฐ๋ก ๋œ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ฆํ•˜๋“ฏ ์ตœ๊ทผ ์ผ๋ณธ์—์„  ์œ„์˜ ๊ฐ•์ ๋“ค์„ ๊ฐ–์ถ˜ ํ•œ๊ตญ์˜ ์‹ ์ง„ ๋ฉ”์ดํฌ์—… ๋ธŒ๋žœ๋“œ๋“ค์ด ์ธ๊ธฐ๋‹ค. ์‹ค์ œ๋กœ ์ผ๋ณธ ๋‚ด ํŠธ์œ„ํ„ฐ์™€ ์œ ํŠœ๋ธŒ ๋“ฑ SNS์—์„œ๋Š” ์ˆ˜์‹ญ~์ˆ˜๋ฐฑ๋งŒ ํŒ”๋กœ์›Œ๋ฅผ ๋ณด์œ ํ•œ ํ˜„์ง€ ์ธํ”Œ๋ฃจ์–ธ์„œ๋“ค๋„ ์ผ๋ช… โ€˜๋‚ด๋ˆ๋‚ด์‚ฐโ€™(๋‚ด ๋ˆ ์ฃผ๊ณ  ๋‚ด๊ฐ€ ์‚ฐ ๋ฌผ๊ฑด) ์˜์ƒ์—์„œ ์ž๋ฐœ์ ์œผ๋กœ K๋ทฐํ‹ฐ ๋ฉ”์ดํฌ์—… ๋ธŒ๋žœ๋“œ ์ œํ’ˆ์„ ์†Œ๊ฐœํ•˜๊ณ  ์žˆ๋‹ค. ์ง€๋‚œ 1์›” ์ผ๋ณธ ์˜ค์‚ฌ์นด์— ์†Œ์žฌํ•œ ๋ทฐํ‹ฐ ๋žญํ‚น์ƒต โ€˜์•ณ์ฝ”์Šค๋ฉ” ์šฐ๋ฉ”๋‹ค์ โ€™์—์„œ ์ผ๋ณธ ์—ฌ์„ฑ๋“ค์ด ํ•œ๊ตญ ์ฝ”์Šค๋ฉ”ํ‹ฑ ๋ธŒ๋žœ๋“œ โ€˜๋ผ์นด(Laka)โ€™์˜ ์ œํ’ˆ์„ ์‚ดํŽด๋ณด๊ณ  ์žˆ๋Š” ๋ชจ์Šต. [๊น€ํšจํ˜œ ๊ธฐ์ž] ๋Œ€ํ‘œ์ ์ธ ์˜ˆ๊ฐ€ โ€˜๋ผ์นดโ€™๋‹ค. ํ•œ๊ตญ๋ณด๋‹ค ์ผ๋ณธ์—์„œ ๋” ์œ ๋ช…ํ•œ ๋ผ์นด๋Š” 100๋งŒ ๊ตฌ๋…์ž๋ฅผ ๋ณด์œ ํ•˜๊ณ  ์žˆ๋Š” ๋ฉ”์ดํฌ์—… ์•„ํ‹ฐ์ŠคํŠธ์ด์ž ์œ ํŠœ๋ฒ„ โ€˜ํžˆ๋กœโ€™(์˜ค๋‹ค๊ธฐ๋ฆฌ ํžˆ๋กœ)๊ฐ€ ์˜์ƒ์—์„œ ์ œํ’ˆ์„ ์ถ”์ฒœํ•ด ํ™๋ณด ํšจ๊ณผ๋ฅผ ํ†กํ†กํžˆ ๋ดค๋‹ค. ์ด๋ฏผ๋ฏธ ๋ผ์นด ๋Œ€ํ‘œ๋Š” ์ผ๋ณธ์—์„œ ํŠน์ • ์ œํ’ˆ์ด ๊ฐ‘์ž๊ธฐ ํ•˜๋ฃจ์— ์ˆ˜์ฒœ๊ฐœ๊ฐ€ ํŒ”๋ ค ๋ฌด์Šจ ์ผ์ธ๊ฐ€ ๋ดค๋Š”๋ฐ, ํ˜„์ง€ ์œ ๋ช… ์œ ํŠœ๋ฒ„๊ฐ€ ์ถ”์ฒœํ•œ ์˜์ƒ์ด ์˜ฌ๋ผ์™”๋”๋ผ๋ฉฐ ํ˜‘์ฐฌ์ด๋‚˜ ๊ด‘๊ณ ๊ฐ€ ์•„๋‹ˆ์–ด์„œ ๋” ๋†€๋ž๋‹ค๊ณ  ๋งํ–ˆ๋‹ค. ์ด์— ์ง€๋‚œ 2020๋…„ ์ฒ˜์Œ ์ผ๋ณธ์— ์ง„์ถœํ•œ ๋ผ์นด๋Š” ์˜ฌํ•ด 1์›” ๋ง ์ผ๋ณธ ์ „์—ญ ์•ฝ 350์—ฌ๊ฐœ ๋งค์žฅ์— ์ž…์ ํ•˜๋Š” ์„ฑ๊ณผ๋ฅผ ์˜ฌ๋ ธ๋‹ค. 2021๋…„ 47์–ต์›์— ๋ถˆ๊ณผํ–ˆ๋˜ ๋ผ์นด์˜ ๋งค์ถœ๋„ ์ง€๋‚œํ•ด 4๋ฐฐ๊ฐ€ ๋„˜๊ฒŒ ์ƒ์Šนํ•ด 200์–ต์›์— ์œก๋ฐ•ํ•œ๋‹ค. ์ผ๋ณธ ์‹œ์žฅ์—์„œ ๋‘๊ฐ์„ ๋ณด์ด๋Š” ๊ตญ๋‚ด ํ™”์žฅํ’ˆ ๋ธŒ๋žœ๋“œ๋“ค์ด ๋Š˜๋ฉด์„œ ์ƒˆ๋กญ๊ฒŒ ์ง„์ถœ์„ ํƒ€์ง„ํ•˜๊ฑฐ๋‚˜ ์ค€๋น„ํ•˜๊ณ  ์žˆ๋Š” ์—…์ฒด๋“ค๋„ ๋Š˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋™์•ˆ ํ•œ๊ตญ ํ™”์žฅํ’ˆ์˜ ๊ฐ€์žฅ ํฐ ์‹œ์žฅ์ด์—ˆ๋˜ ์ค‘๊ตญ์ด ๊ฒฝ๊ธฐ ์นจ์ฒด ๋ฐ ์ •์น˜์  ์ด์Šˆ ๋“ฑ์œผ๋กœ ์ชผ๊ทธ๋ผ๋“ค๊ณ  ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ์ผ๋ณธ์ด ์ด๋ฅผ ๋Œ€์ฒดํ•  ์ƒˆ๋กœ์šด ์‹œ์žฅ์œผ๋กœ ๋ถ€์ƒํ•œ ๊ฒƒ์ด๋‹ค. ์ผ๋ณธ ํ™”์žฅํ’ˆ ํŒ๋งค ์ฑ„๋„๋“ค๋„ K๋ทฐํ‹ฐ ์œ ์น˜์— ์ ๊ทน์ ์ด๋‹ค. ์•ณ์ฝ”์Šค๋ฉ”์˜ ๊ฒฝ์šฐ ๊ฑฐ์˜ ๋งค๋‹ฌ K๋ทฐํ‹ฐ ํŒ์—…์ด ์—ด๋ฆฌ๊ณ  ์žˆ๋Š” ์ˆ˜์ค€์œผ๋กœ, ์˜ค๋Š” 5์›”์—๋Š” ๋„์ฟ„์ ์—์„œ K๋ทฐํ‹ฐ ํŽ˜์Šคํ‹ฐ๋ฒŒ๋„ ์—ด ๊ณ„ํš์ด๋‹ค. ๋กœํ”„ํŠธ์™€ ํ”„๋ผ์ž ๋“ฑ๋„ K๋ทฐํ‹ฐ ์œ ์น˜ ๊ฒฝ์Ÿ์ด ๋œจ๊ฒ๋‹ค. CJ์˜ฌ๋ฆฌ๋ธŒ์˜ ๊ด€๊ณ„์ž๋Š” ํ•œ๊ตญ ํ™”์žฅํ’ˆ์— ๋Œ€ํ•œ ๋ฐ˜์‘์ด ์ข‹๊ณ  ํŠนํžˆ ์˜ฌ๋ฆฌ๋ธŒ์˜์—์„œ ์ธ๊ธฐ ์žˆ๋Š” ๋ธŒ๋žœ๋“œ์— ๋Œ€ํ•œ ์ˆ˜์š”๊ฐ€ ๋†’๋‹ค ๋ณด๋‹ˆ ํ”Œ๋žซํผ์—์„œ ๋จผ์ € ํŒ์—… ์š”์ฒญ์ด ์™”๋‹ค๋ฉฐ ์•ž์œผ๋กœ๋„ ์ผ๋ณธ ์‹œ์žฅ ์œ ํ†ต์— ๋”์šฑ ์ ๊ทน์ ์œผ๋กœ ๋‚˜์„œ๋ ค ํ•œ๋‹ค๊ณ  ์ „ํ–ˆ๋‹ค."
65
+ prompt = prompt_template.format(passage)
66
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
67
+ outputs = model.generate(**inputs,
68
+ max_new_tokens=512,
69
+ temperature=0.2,
70
+ top_p=0.95,
71
+ do_sample=True,
72
+ use_cache=False)
73
+ print(tokenizer.decode(outputs[0]))
74
+ - ์ผ๋ณธ ๋‚ด K๋ทฐํ‹ฐ์˜ ์ธ๊ธฐ๊ฐ€ ์˜ˆ์‚ฌ๋กญ์ง€ ์•Š์€ ๊ฐ€์šด๋ฐ, ์ผ๋ณธ ๋‚ด์—์„œ ํ•œ๊ตญ ๋ฌธํ™”๋ฅผ ์ข‹์•„ํ•˜๋Š” ์ผ๋ถ€ ์†Œ๋น„์ž๋“ค ์‚ฌ์ด์—์„œ๋งŒ ์œ ํ–‰ํ•˜๋Š” ์ˆ˜์ค€์ด์—ˆ๋˜ K๋ทฐํ‹ฐ๊ฐ€ ์ง€๊ธˆ์€ ์ผ๋ณธ ๋ทฐํ‹ฐ ์‹œ์žฅ์— ํ•˜๋‚˜์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ์ž๋ฆฌ ์žก์•˜๋‹ค๋Š” ํ‰๊ฐ€๋ฅผ ๋ฐ›๊ณ  ์žˆ๋‹ค.