File size: 8,433 Bytes
9146258 f962cef 9146258 f962cef 02dc506 f962cef 02dc506 f962cef 02dc506 f962cef 02dc506 f962cef 02dc506 f962cef 9146258 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
---
language:
- en
- ko
pipeline_tag: text-generation
inference: false
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
- llama-2-chat
license: apache-2.0
library_name: peft
---
# komt-Llama-2-13b-hf-lora
This model fine-tuned the aaa model using PEFT-LoRA.
The "komt-Llama-2-13b-hf-lora" model was developed using a multi-task instruction technique aimed at enhancing Korean language performance. For more details, please refer to the GitHub Repository.
Please refer below for more detailed information.
For more detailed information, please refer to the https://huggingface.co/davidkim205/komt-Llama-2-13b-hf.
## Model Details
* **Model Developers** : davidkim(changyeon kim)
* **Repository** : https://github.com/davidkim205/komt
* **Lora target modules** : q_proj, o_proj, v_proj, gate_proj, down_proj, k_proj, up_proj
* **Model Size** : 120MB
* **Model Architecture** : komt-Llama-2-13b is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning by multi-task instruction
## Dataset
korean multi-task instruction dataset
## Prompt Template
```
### instruction: {prompt}
### Response:
```
Examples:
```
### instruction: ์ํ ํด๋ฆฌํฌํฐ ์๋ฆฌ์ฆ ๋ฐฐ๊ธ์ฌ๊ฐ ์ด๋์ผ
### Response:
```
response:
```
### instruction: ์ํ ํด๋ฆฌํฌํฐ ์๋ฆฌ์ฆ ๋ฐฐ๊ธ์ฌ๊ฐ ์ด๋์ผ
### Response: ์๋ ๋ธ๋ผ๋์ค\n๊ทธ ์ธ์๋ ํค๋ค์ค ๊ฐ์ ์์ ์ญํ ์ ๋งก์ ์ฌ๋๋ค๋ ์์๋ค. ๋ฐ์ด๋น๋ ์๋ฐ๋ ์ฆ๋ ๋ณผ๋๋ชจํธ ๊ตฐ๋จ์ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต ์ค ํ๋์ธ ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฅ์ด๋ฉฐ, ํ๋ฆฌ์ฐ์ค ํ๋ฆฌํธ์ฐ๋๋ ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฌ์ด๋ค. ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ๋ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต ์ค ํ๋๋ก, ๋ณด๋ก๋ฏธ๋ฅด ์ ์ค์ ๋ฐ๋ฅด๋ฉด ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฅ๋ค์ด ์ฃฝ์ผ๋ฉด ์ธ๊ณ๋ฅผ ๋ฉธ๋ง์ํจ๋ค๋ ์ ์ ์๊ณ ์๋ค. ใํด๋ฆฌ ํฌํฐ์ ํผํ ์์ใ์์ ๋ฐ์ค ์ดํฐ๋ ํด๋ฆฌ ํฌํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ์คํจํ๊ณ ํด๋ฆฌ ํฌํฐ๋ ๋ฐ์ค ์ดํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ๋ ์คํจํ๋ค. ใํด๋ฆฌ ํฌํฐ์ ์์ฆ์นด๋ฐ์ ์ฃ์ใ์์ ๋ฐ์ค ์ดํฐ๋ ์์ฆ์นด๋ฐ์ ์ฃ์๋ก ๋ฑ์ฅํ์ฌ ํด๋ฆฌ ํฌํฐ์๊ฒ ๋ณต์๋ฅผ ํ๊ณ ์ ํ์ง๋ง ํด๋ฆฌ ํฌํฐ์ ๋ก ์์ฆ๋ฆฌ์ ๋ฐ ๋น ๋ฅธ ๋์ฒ๋ก ์คํจํ๊ณ ํด๋ฆฌ ํฌํฐ์ ๋ก ์์ฆ๋ฆฌ๋ ๋ฐ์ค ์ดํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ์คํจํ๊ณ ๊ทธ ๊ฒฐ๊ณผ ๋ฐ์ค ์ดํฐ๋ ๋ค์ ๊ธฐ์ ์น์ ๋ด์ธ๋๋ค. ใํด๋ฆฌ ํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผใ์์ ๋ฐ์ค ์ดํฐ๋ ๋ถํํ์ฌ ์ฃฝ์์ ์ฑ๋ฌผ 1๋ถ์ ๋ฑ์ฅํ์์ผ๋ฉฐ, ํด๋ฆฌ ํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผ 2๋ถ์์ ๋ฐ์ค ์ดํฐ๋ ์ฃฝ์์ ์ฑ๋ฌผ 2๋ถ์ ๋ด์ฉ์ ๋ฐฉํดํ๋ ๊ฐ์ฒฉ ์ญํ ์ ํ๋ค. ๋ฐ์ค ์ดํฐ๋ ์ํ ํด๋ฆฌํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผ 1๋ถ์์ ๋ฐ์ค ์ดํฐ์ ์ญํ ์ ๋งก์ ๋ฐฐ์ฐ ์คํฐ๋ธ ํผ์นด๋๊ฐ ์ฐ๊ธฐํ๋ค.
```
## Usage
After downloading from GitHub, please install as follows:
```
git clone https://github.com/davidkim205/komt
cd komt
pip install -r lora/requirements_lora.txt
```
* Requirements Python >=3.8. Linux distribution (Ubuntu, MacOS, etc.) + CUDA > 10.0.
Refer https://github.com/TimDettmers/bitsandbytes#tldr
```
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import StoppingCriteria, StoppingCriteriaList
from transformers import TextStreamer, GenerationConfig
from peft import PeftModel, PeftConfig
class LocalStoppingCriteria(StoppingCriteria):
def __init__(self, tokenizer, stop_words = []):
super().__init__()
stops = [tokenizer(stop_word, return_tensors='pt', add_special_tokens = False)['input_ids'].squeeze() for stop_word in stop_words]
print('stop_words', stop_words)
print('stop_words_ids', stops)
self.stop_words = stop_words
self.stops = [stop.cuda() for stop in stops]
self.tokenizer = tokenizer
def _compare_token(self, input_ids):
for stop in self.stops:
if len(stop.size()) != 1:
continue
stop_len = len(stop)
if torch.all((stop == input_ids[0][-stop_len:])).item():
return True
return False
def _compare_decode(self, input_ids):
input_str = self.tokenizer.decode(input_ids[0])
for stop_word in self.stop_words:
if input_str.endswith(stop_word):
return True
return False
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
input_str = self.tokenizer.decode(input_ids[0])
for stop_word in self.stop_words:
if input_str.endswith(stop_word):
return True
return False
#
# config
peft_model_name = 'davidkim205/komt-Llama-2-7b-chat-hf-lora'
model_name = 'davidkim205/komt-Llama-2-7b-chat-hf'
instruction_prefix = "### instruction: "
input_prefix = "### input: "
answer_prefix = "### Response: "
endoftext = "<|end|>"
stop_words = [endoftext, '<s>', '###']
generation_config = GenerationConfig(
temperature=0.9,
top_p=0.7,
top_k=100,
max_new_tokens=2048,
early_stopping=True,
do_sample=True,
)
#
# create model
config = PeftConfig.from_pretrained(peft_model_name)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config,
device_map="auto")
model = PeftModel.from_pretrained(model, peft_model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
stopping_criteria = StoppingCriteriaList([LocalStoppingCriteria(tokenizer=tokenizer, stop_words=stop_words)])
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
model.eval()
#
# generate
prompt = f"### instruction: ์ํ ํด๋ฆฌํฌํฐ ์๋ฆฌ์ฆ ๋ฐฐ๊ธ์ฌ๊ฐ ์ด๋์ผ\n\n### Response:"
gened = model.generate(
**tokenizer(
prompt,
return_tensors='pt',
return_token_type_ids=False
).to('cuda'),
generation_config=generation_config,
eos_token_id=model.config.eos_token_id,
stopping_criteria=stopping_criteria,
streamer=streamer
)
output_text = tokenizer.decode(gened[0], skip_special_tokens=True)
print('--------------------')
print(output_text)
```
response:
```
์๋ ๋ธ๋ผ๋์ค\n๊ทธ ์ธ์๋ ํค๋ค์ค ๊ฐ์ ์์ ์ญํ ์ ๋งก์ ์ฌ๋๋ค๋ ์์๋ค. ๋ฐ์ด๋น๋ ์๋ฐ๋ ์ฆ๋ ๋ณผ๋๋ชจํธ ๊ตฐ๋จ์ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต ์ค ํ๋์ธ ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฅ์ด๋ฉฐ, ํ๋ฆฌ์ฐ์ค ํ๋ฆฌํธ์ฐ๋๋ ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฌ์ด๋ค. ํ๋ฆฝ์ค๋ถ๋ฅด๊ทธ๋ ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต ์ค ํ๋๋ก, ๋ณด๋ก๋ฏธ๋ฅด ์ ์ค์ ๋ฐ๋ฅด๋ฉด ๋ณด๋ก๋ฏธ๋ฅด 7๊ฐ ํ๊ต์ ๊ต์ฅ๋ค์ด ์ฃฝ์ผ๋ฉด ์ธ๊ณ๋ฅผ ๋ฉธ๋ง์ํจ๋ค๋ ์ ์ ์๊ณ ์๋ค. ใํด๋ฆฌ ํฌํฐ์ ํผํ ์์ใ์์ ๋ฐ์ค ์ดํฐ๋ ํด๋ฆฌ ํฌํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ์คํจํ๊ณ ํด๋ฆฌ ํฌํฐ๋ ๋ฐ์ค ์ดํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ๋ ์คํจํ๋ค. ใํด๋ฆฌ ํฌํฐ์ ์์ฆ์นด๋ฐ์ ์ฃ์ใ์์ ๋ฐ์ค ์ดํฐ๋ ์์ฆ์นด๋ฐ์ ์ฃ์๋ก ๋ฑ์ฅํ์ฌ ํด๋ฆฌ ํฌํฐ์๊ฒ ๋ณต์๋ฅผ ํ๊ณ ์ ํ์ง๋ง ํด๋ฆฌ ํฌํฐ์ ๋ก ์์ฆ๋ฆฌ์ ๋ฐ ๋น ๋ฅธ ๋์ฒ๋ก ์คํจํ๊ณ ํด๋ฆฌ ํฌํฐ์ ๋ก ์์ฆ๋ฆฌ๋ ๋ฐ์ค ์ดํฐ๋ฅผ ์ฃฝ์ด๋ ค๊ณ ํ์ง๋ง ์คํจํ๊ณ ๊ทธ ๊ฒฐ๊ณผ ๋ฐ์ค ์ดํฐ๋ ๋ค์ ๊ธฐ์ ์น์ ๋ด์ธ๋๋ค. ใํด๋ฆฌ ํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผใ์์ ๋ฐ์ค ์ดํฐ๋ ๋ถํํ์ฌ ์ฃฝ์์ ์ฑ๋ฌผ 1๋ถ์ ๋ฑ์ฅํ์์ผ๋ฉฐ, ํด๋ฆฌ ํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผ 2๋ถ์์ ๋ฐ์ค ์ดํฐ๋ ์ฃฝ์์ ์ฑ๋ฌผ 2๋ถ์ ๋ด์ฉ์ ๋ฐฉํดํ๋ ๊ฐ์ฒฉ ์ญํ ์ ํ๋ค. ๋ฐ์ค ์ดํฐ๋ ์ํ ํด๋ฆฌํฌํฐ์ ์ฃฝ์์ ์ฑ๋ฌผ 1๋ถ์์ ๋ฐ์ค ์ดํฐ์ ์ญํ ์ ๋งก์ ๋ฐฐ์ฐ ์คํฐ๋ธ ํผ์นด๋๊ฐ ์ฐ๊ธฐํ๋ค.
```
## Hardware and Software
- nvidia driver : 535.54.03
- CUDA Version: 12.2
-
## Training procedure
The following `bitsandbytes` quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
### Framework versions
- PEFT 0.5.0.dev0
|