|
--- |
|
library_name: peft |
|
base_model: sysong11/dapt-kogpt |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This repo contains a low-rank adapter for [domain-adapted KoGPT](https://huggingface.co/sysong11/dapt-kogpt) fit on [a small supervised tuning dataset for summarization](https://huggingface.co/datasets/sysong11/sum_train_rev). |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
import json |
|
from random import randrange |
|
import torch |
|
|
|
from peft import LoraConfig, get_peft_model |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
from peft import PeftModel |
|
|
|
model1 = AutoModelForCausalLM.from_pretrained( |
|
"sysong11/dapt-kogpt", torch_dtype="auto", device_map="auto" |
|
) |
|
|
|
|
|
lora_path = "sysong11/dapt-kogpt-orca-sum-adapter" |
|
model2 = PeftModel.from_pretrained(model1, lora_path, device_map="auto") |
|
tokenizer = AutoTokenizer.from_pretrained(lora_path) |
|
|
|
|
|
test_data = [] |
|
with open("./datasets/test.json", "rb") as f: |
|
for line in f: |
|
test_data.append(json.loads(line)) |
|
|
|
|
|
prompt_template = """\ |
|
<|im_start|>system |
|
{system_prompt}<|im_end|> |
|
<|im_start|>user |
|
{prompt}<|im_end|> |
|
<|im_start|>assistant""" |
|
|
|
msg = "Q:๋ค์ ๋ฌธ์๋ฅผ ์์ฝ ํ์ธ์, Context:{context}" |
|
|
|
ix = randrange(len(test_data)) |
|
print(ix) |
|
datapoint = test_data[ix] |
|
ref = test_data[ix]["summary_text"] |
|
system_prompt = "You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can." |
|
tokens = tokenizer.encode( |
|
prompt_template.format( |
|
system_prompt=system_prompt, |
|
prompt=msg.format(context=datapoint["original_text"]), |
|
), |
|
return_tensors="pt", |
|
).to(device="cuda", non_blocking=True) |
|
|
|
gen_tokens = model2.generate( |
|
input_ids=tokens, |
|
do_sample=False, |
|
temperature=0.5, |
|
max_length=1024, |
|
pad_token_id=63999, |
|
eos_token_id=63999, |
|
) |
|
inputs = tokenizer.batch_decode([gen_tokens[0][: tokens[0].shape[0]]])[0] |
|
generated = tokenizer.batch_decode([gen_tokens[0][tokens[0].shape[0] :]])[0].replace( |
|
"<|im_end|>", "" |
|
) |
|
print(inputs) |
|
print("generated:") |
|
print(generated) |
|
|
|
|
|
``` |
|
|
|
### Framework versions |
|
|
|
- PEFT 0.7.1 |