---
library_name: transformers
tags:
- llama-factory
license: apache-2.0
---

## Model
- base model: [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
- parent model: [youjunhyeok/llama3-8b-ko-sft-v1](https://huggingface.co/youjunhyeok/llama3-8b-ko-sft-v1)

## Dataset
- [youjunhyeok/ko-orca-pair-and-ultrafeedback-dpo](https://huggingface.co/datasets/youjunhyeok/ko-orca-pair-and-ultrafeedback-dpo)

## Load Model

Use the following Python code to load the model:

```python3
from transformers import AutoTokenizer, AutoModelForCausalLM

path = 'youjunhyeok/llama3-8b-ko-sft-dpo-v1'

model = AutoModelForCausalLM.from_pretrained(path)
tokenizer = AutoTokenizer.from_pretrained(path)
```

## Chat

```python3
def chat(message):
    messages = [
        {"role": "system", "content": "당신은 인공지능 어시트턴트입니다. 친절하고 정확한 답변을 해주세요."},
        {"role": "user", "content": message},
    ]

    input_ids = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(model.device)

    terminators = [
        tokenizer.eos_token_id,
        tokenizer.convert_tokens_to_ids("<|eot_id|>")
    ]

    outputs = model.generate(
        input_ids,
        max_new_tokens=2048,
        eos_token_id=terminators,
        do_sample=True,
        temperature=0.3,
        top_p=0.9,
    )
    response = outputs[0][input_ids.shape[-1]:]
    print(tokenizer.decode(response, skip_special_tokens=True))

chat('헬스장 가기는 싫은데 집에서 할만한 맨몸 운동 3개 정도 알려줘')
```

## Output

```
1. 스쿼트: 스쿼트는 다리 근육을 강화하고 전반적인 체력 향상에 도움이 되는 효과적인 맨몸 운동입니다. 스쿼트를 수행하려면 발을 어깨 너비로 벌리고 다리를 곧게 펴고 엉덩이를 뒤로 당깁니다. 그런 다음 다리를 곧게 펴고 발을 바닥에 놓습니다. 이 과정을 12~15회 반복합니다.

2. 팔굽혀펴기: 팔굽혀펴기는 상체 근육을 강화하고 전반적인 체력 향상에 도움이 되는 또 다른 효과적인 맨몸 운동입니다. 팔굽혀펴기를 수행하려면 팔을 어깨 너비로 벌리고 다리를 엉덩이 높이에 놓습니다. 그런 다음 팔을 바닥에 닿을 때까지 아래로 내리고 다시 원래 위치로 들어 올립니다. 이 과정을 12~15회 반복합니다.

3. 플랭크: 플랭크는 코어 근육을 강화하고 전반적인 체력 향상에 도움이 되는 또 다른 효과적인 맨몸 운동입니다. 플랭크를 수행하려면 팔을 어깨 너비로 벌리고 다리를 엉덩이 높이에 놓습니다. 그런 다음 몸을 곧게 펴고 코어 근육을 사용하여 몸을 지탱합니다. 이 자세를 30~60초 유지한 다음 원래 자세로 돌아갑니다. 이 과정을 2~3회 반복합니다.
```

## BenchMark (KOR)

```
# alias
A = youjunhyeok/llama3-8b-ko-sft-dpo-v1
B = youjunhyeok/llama3-8b-ko-sft-v1
C = meta-llama/Meta-Llama-3-8B
D = chihoonlee10/T3Q-ko-solar-dpo-v7.0 (24.05.24 ko 리더보드 1등)
```

| Benchmark (macro_f1)      |   A  |   B  |   C  |   D  |
|---------------------------|:----:|:----:|:----:|:----:|
| kobest_boolq (0-shot)     | 79.0 | 84.7  | 38.2 | 34.1 |
| kobest_boolq (5-shot)     | 86.7 | 85.4 | 83.8 | 93.1 |
| kobest_copa (0-shot)      | 60.3 | 60.6 | 63.1 | 81.0 |
| kobest_copa (5-shot)      | 67.7 | 67.2 | 69.1 | 91.0 |
| kobest_hellaswag (0-shot) | 43.1 | 40.0 | 42.1 | 55.1 |
| kobest_hellaswag (5-shot) | 43.3 | 42.4 | 44.2 | 55.2 |
| kobest_sentineg (0-shot)  | 65.1 | 52.1 | 51.5 | 82.7 |
| kobest_sentineg (5-shot)  | 92.1 | 89.4 | 94.7 | 91.4 |

## BenchMark (ENG)

```
# alias
A = youjunhyeok/llama3-8b-ko-sft-dpo-v1
B = youjunhyeok/llama3-8b-ko-sft-v1
C = meta-llama/Meta-Llama-3-8B
```

|               |     A |     B |     C |
|:--------------|------:|------:|------:|
| openbookqa    | 0.340 | 0.342 | 0.338 |
| hellaswag     | 0.558 | 0.555 | 0.576 |
| boolq         | 0.827 | 0.824 | 0.831 |
| arc_easy      | 0.764 | 0.758 | 0.815 |
| arc_challenge | 0.476 | 0.464 | 0.529 |

## Llama_factory trainer_config.yaml
{data_dir}, {dataset_name}, {output_dir} is variable
```
cutoff_len: 1024
dataset: {dataset_name}
dataset_dir: {data_dir}
ddp_timeout: 180000000
do_train: true
eval_steps: 500
eval_strategy: steps
finetuning_type: lora
flash_attn: auto
fp16: true
gradient_accumulation_steps: 4
include_num_input_tokens_seen: true
learning_rate: 5.0e-06
logging_steps: 10
lora_alpha: 16
lora_dropout: 0.05
lora_rank: 8
lora_target: all
lr_scheduler_type: cosine
max_grad_norm: 1.0
max_samples: 15000
model_name_or_path: youjunhyeok/llama3-8b-ko-sft-v1
num_train_epochs: 1.0
optim: adamw_torch
output_dir: {output_dir}
packing: false
per_device_eval_batch_size: 4
per_device_train_batch_size: 4
plot_loss: true
pref_beta: 0.1
pref_ftx: 0
pref_loss: sigmoid
preprocessing_num_workers: 16
report_to: none
resize_vocab: true
save_steps: 500
stage: dpo
template: llama3
val_size: 0.1
warmup_steps: 100
```