davidkim205 commited on
Commit
0a95e86
β€’
1 Parent(s): 873671c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - ko
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - facebook
9
+ - meta
10
+ - pytorch
11
+ - llama
12
+ - llama-2
13
+ - llama-2-chat
14
+ library_name: peft
15
+ ---
16
+ # komt : korean multi task instruction tuning model
17
+ ![multi task instruction tuning.jpg](https://github.com/davidkim205/komt/assets/16680469/c7f6ade7-247e-4b62-a94f-47e19abea68e)
18
+
19
+ Recently, due to the success of ChatGPT, numerous large language models have emerged in an attempt to catch up with ChatGPT's capabilities.
20
+ However, when it comes to Korean language performance, it has been observed that many models still struggle to provide accurate answers or generate Korean text effectively.
21
+ This study addresses these challenges by introducing a multi-task instruction technique that leverages supervised datasets from various tasks to create training data for Large Language Models (LLMs).
22
+
23
+ ## Model Details
24
+
25
+ * **Model Developers** : davidkim(changyeon kim)
26
+ * **Repository** : https://github.com/davidkim205/komt
27
+ * **Model Architecture** : The komt-mistral-7b-v1-dpo is is a fine-tuned version of the komt-mistral-7b-v1(original model : Mistral-7B-Instruct-v0.1).
28
+
29
+
30
+ ## Dataset
31
+ * maywell/ko_Ultrafeedback_binarized
32
+ https://huggingface.co/datasets/maywell/ko_Ultrafeedback_binarized
33
+
34
+ ## Hardware and Software
35
+ - nvidia driver : 535.54.03
36
+ - CUDA Version: 12.2
37
+
38
+ ## Training
39
+ Refer https://github.com/davidkim205/komt
40
+
41
+ ## Prompt template: Mistral
42
+ ```
43
+ <s>[INST] {prompt} [/INST]</s>
44
+ ```
45
+
46
+ ## Usage
47
+ ```
48
+ import torch
49
+
50
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
51
+ from peft import PeftModel, PeftConfig
52
+ from transformers import TextStreamer, GenerationConfig
53
+
54
+
55
+ model='davidkim205/komt-mistral-7b-v1'
56
+ peft_model_name = 'davidkim205/komt-mistral-7b-v1-dpo'
57
+ config = PeftConfig.from_pretrained(peft_model_name)
58
+ bnb_config = BitsAndBytesConfig(
59
+ load_in_4bit=True,
60
+ bnb_4bit_use_double_quant=True,
61
+ bnb_4bit_quant_type="nf4",
62
+ bnb_4bit_compute_dtype=torch.bfloat16
63
+ )
64
+ config.base_model_name_or_path =model
65
+ model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map="auto")
66
+ model = PeftModel.from_pretrained(model, peft_model_name)
67
+ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
68
+ streamer = TextStreamer(tokenizer)
69
+
70
+ def gen(x):
71
+ generation_config = GenerationConfig(
72
+ temperature=0.8,
73
+ top_p=0.8,
74
+ top_k=100,
75
+ max_new_tokens=1024,
76
+ early_stopping=True,
77
+ do_sample=True,
78
+ )
79
+ q = f"[INST]{x} [/INST]"
80
+ gened = model.generate(
81
+ **tokenizer(
82
+ q,
83
+ return_tensors='pt',
84
+ return_token_type_ids=False
85
+ ).to('cuda'),
86
+ generation_config=generation_config,
87
+ pad_token_id=tokenizer.eos_token_id,
88
+ eos_token_id=tokenizer.eos_token_id,
89
+ streamer=streamer,
90
+ )
91
+ result_str = tokenizer.decode(gened[0])
92
+
93
+ start_tag = f"[/INST]"
94
+ start_index = result_str.find(start_tag)
95
+
96
+ if start_index != -1:
97
+ result_str = result_str[start_index + len(start_tag):].strip()
98
+ return result_str
99
+
100
+ result = gen('μ œμ£Όλ„λ₯Ό 1λ°•2일둜 혼자 μ—¬ν–‰ν•˜λ €κ³  ν•˜λŠ”λ° μ—¬ν–‰ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€˜')
101
+
102
+ print('##########')
103
+ print(result)
104
+ ```
105
+ output
106
+ ```
107
+ <s> [INST]μ œμ£Όλ„λ₯Ό 1λ°•2일둜 혼자 μ—¬ν–‰ν•˜λ €κ³  ν•˜λŠ”λ° μ—¬ν–‰ μ½”μŠ€λ₯Ό λ§Œλ“€μ–΄μ€˜ [/INST]μ œμ£Όλ„μ˜ μ•„λ¦„λ‹€μš΄ μžμ—° κ²½κ΄€κ³Ό λ¬Έν™”μœ μ‚°μ„ 느끼고 μ‹Άλ‹€λ©΄ λ‹€μŒκ³Ό 같은 μ½”μŠ€λ₯Ό μΆ”μ²œν•œλ‹€.
108
+
109
+ 첫 번째 μ½”μŠ€λŠ” μ•„λ¦„λ‹€μš΄ ν•΄μ•ˆ κ²½κ΄€κ³Ό ν•΄μ•ˆμ‚¬κ΅¬λ₯Ό κ°μƒν•˜κΈ° μœ„ν•΄ 제주 μ„œλΆ€ ν•΄μ•ˆμœΌλ‘œ μ΄λ™ν•˜λŠ” 것이닀. 제주 μ‹œλ‚΄μ—μ„œ μ™Όμͺ½ λ°©ν–₯으둜 νƒλ‚˜λ©΄ ν•œλ¦Όν•΄μˆ˜μš•μž₯, μ„±μ‚°ν•΄μˆ˜μš•μž₯, λ΄‰λ™ν•΄μˆ˜μš•μž₯ λ“± 유λͺ…ν•œ ν•΄μˆ˜μš•μž₯을 κ²½μœ ν•  수 μžˆλ‹€. 이 지역은 맑은 바닀와 넓은 ν•΄μ•ˆμ—μ„œ ν•΄μˆ˜μš•μ„ 즐길 수 있으며, ν•΄μˆ˜μš•μž₯ μ£Όλ³€μ—λŠ” λ§Žμ€ μŒμ‹μ μ΄ μžˆμ–΄ 배식을 즐길 수 μžˆλ‹€. μ„œμͺ½ ν•΄μ•ˆμœΌλ‘œ μ΄λ™ν•˜λŠ” λ™μ•ˆ 제주 λŒ€ν‘œ μ‚¬κ³„μ ˆ 맛집인 ν—ˆλΈŒ μˆ˜ν”„ 및 μ†ŒλΌλΉ„ λ“± λ§›μžˆλŠ” μŒμ‹μ„ 맛볼 수 μžˆλ‹€. μ„œλΆ€ ν•΄μ•ˆμ„ λŒμ•„ λ‹€μ‹œ 제주 μ‹œλ‚΄λ‘œ λŒμ•„μ˜€λŠ” λ™μ•ˆ 제주 νŠΉμ‚°ν’ˆ μ‹œμž₯μ—μ„œ 제주 νŠΉμ‚°ν’ˆμ„ μ‚΄ 수 μžˆλ‹€.
110
+
111
+ 두 번째 μ½”μŠ€λŠ” 동뢀 ν•΄μ•ˆμ„ λŒμ•„λ³΄λŠ” 것이닀. 제주 μ‹œλ‚΄μ—μ„œ 였λ₯Έμͺ½ λ°©ν–₯으둜 νƒλ‚˜λ©΄ μ•„μ΄μŠ€ν¬λ¦Ό 거리인 ν•œλ¦Όν•΄μˆ˜μš•μž₯, μ„±μ‚°ν•΄μˆ˜μš•μž₯, λ΄‰λ™ν•΄μˆ˜μš•μž₯ λ“± λ‹€μ‹œ ν•œ 번 유λͺ…ν•œ ν•΄μˆ˜μš•μž₯을 κ²½μœ ν•  수 μžˆλ‹€. 이 지역은 ν•΄μˆ˜μš•μž₯ μ£Όλ³€μ—λŠ” λ§Žμ€ μŒμ‹μ μ΄ μžˆμ–΄ 배식을 즐길 수 μžˆλ‹€. 동뢀 ν•΄μ•ˆμ„ λŒμ•„ λ‹€μ‹œ 제주 μ‹œλ‚΄λ‘œ λŒμ•„μ˜€λŠ” λ™μ•ˆ 제주 νŠΉμ‚°ν’ˆ μ‹œμž₯μ—μ„œ 제주 νŠΉμ‚°ν’ˆμ„ μ‚΄ 수 μžˆλ‹€. 이 μ§€μ—­μ—λŠ” λ§Žμ€ μŒμ‹μ μ΄ μžˆμ–΄ λ§›μžˆλŠ” μŒμ‹μ„ 맛볼 수 μžˆλ‹€.
112
+
113
+ μ„Έ 번째 μ½”μŠ€λŠ” 제주 λ‚¨λΆ€λ‘œ μ΄λ™ν•˜λŠ” 것이닀. 제주 μ‹œλ‚΄μ—μ„œ 였λ₯Έμͺ½ λ°©ν–₯으둜 νƒλ‚˜λ©΄ 제주 λ‚¨λΆ€λ‘œ 이동할 수 μžˆλ‹€. 이 지역은 ν•œλΌμ‚° ꡭ립곡원이 μœ„μΉ˜ν•΄ μžˆμ–΄ μžμ—° 경관을 감상할 수 μžˆλ‹€. ν•œλΌμ‚° ꡭ립곡원 λ‚΄μ—λŠ” λ‹€μ–‘ν•œ μžμ—° κ²½κ΄€κ³Ό μ‚°μ•… 경둜λ₯Ό 즐길 수 μžˆλŠ” 탐방 μ½”μŠ€κ°€ μžˆλ‹€. λ˜ν•œ, 제주 λ‚¨λΆ€λŠ” λ§Žμ€ ν•΄μˆ˜μš•μž₯κ³Ό 골프μž₯이 μœ„μΉ˜ν•΄ μžˆμ–΄ ν•΄μˆ˜μš•κ³Ό 골프λ₯Ό 즐길 수 μžˆλ‹€. λ‚¨λΆ€λ‘œ μ΄λ™ν•˜λŠ” λ™μ•ˆ 제주 νŠΉμ‚°ν’ˆ μ‹œμž₯μ—μ„œ 제주 νŠΉμ‚°ν’ˆμ„ μ‚΄ 수 μžˆλ‹€.
114
+
115
+
116
+ ```
117
+ ## Evaluation
118
+ For objective model evaluation, we initially used EleutherAI's lm-evaluation-harness but obtained unsatisfactory results. Consequently, we conducted evaluations using ChatGPT, a widely used model, as described in [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06502.pdf) and [Three Ways of Using Large Language Models to Evaluate Chat](https://arxiv.org/pdf/2308.06259.pdf) .
119
+
120
+
121
+ | model | score | average(0~5) | percentage |
122
+ |------------------------------------------|---------| ------------ |------------|
123
+ | gpt-3.5-turbo(close) | 147 | 3.97 | 79.45% |
124
+ | naver Cue(close) | 140 | 3.78 | 75.67% |
125
+ | clova X(close) | 136 | 3.67 | 73.51% |
126
+ | WizardLM-13B-V1.2(open) | 96 | 2.59 | 51.89% |
127
+ | Llama-2-7b-chat-hf(open) | 67 | 1.81 | 36.21% |
128
+ | Llama-2-13b-chat-hf(open) | 73 | 1.91 | 38.37% |
129
+ | nlpai-lab/kullm-polyglot-12.8b-v2(open) | 70 | 1.89 | 37.83% |
130
+ | kfkas/Llama-2-ko-7b-Chat(open) | 96 | 2.59 | 51.89% |
131
+ | beomi/KoAlpaca-Polyglot-12.8B(open) | 100 | 2.70 | 54.05% |
132
+ | **komt-llama2-7b-v1 (open)(ours)** | **117** | **3.16** | **63.24%** |
133
+ | **komt-llama2-13b-v1 (open)(ours)** | **129** | **3.48** | **69.72%** |
134
+ | **komt-llama-30b-v1 (open)(ours)** | **129** | **3.16** | **63.24%** |
135
+ | **komt-mistral-7b-v1 (open)(ours)** | **131** | **3.54** | **70.81%** |
136
+ | **komt-mistral-7b-v1-dpo (open)(ours)** | **142** | **3.83** | **76.75%** |