---
library_name: transformers
datasets:
- llm-jp/hh-rlhf-12k-ja
- ayayana/hh-rlhf-12k-ja-s300
- kinokokoro/ichikara-instruction-003
base_model:
- llm-jp/llm-jp-3-13b
- ayayana/llm-jp-3-13b-ayanatest_lora
license: llama3.1
language:
- ja
---

# Model Card for Model ID
- ayayana/llm-jp-3-13b-dpo_ayana10
- loraアダプタ
- ベースモデルの権利を継承しますが、学習に使ったデータセットは研究用のみ用途のみに関してしか確認していません。
<!-- Provide a quick summary of what the model is/does. -->

## Model Details
- 松尾研LLM講座2024にて、最終課題演習に利用したモデルになります。
- DPOモデルの演習目的で制作。ベースモデル(llm-jp-3-13b)+オリジナルSFTをしたLoraアダプタ（llm-jp-3-13b-ayanatest_lora）を結合し、DPOを行った
- ベースモデル：llm-jp/llm-jp-3-13b
- SFTloraアダプタモデル：https://huggingface.co/ayayana/llm-jp-3-13b-ayanatest_lora
-- :llm-jp/llm-jp-3-13bをベースに、ichikara003-001-1データセットでSFTしたlaraモデル
- DPOに利用したデータセット:https://huggingface.co/datasets/ayayana/hh-rlhf-12k-ja-s300
-- :hh-rlhf-jaよりランダム抽出された300件を利用（カラム文字数は70以上512文字までを対象/promptは、初回会話のみをデータとして格納し利用：

### Results/Summary
[-ElayzaTV100判定スコア：2.85]
- スコアは、loraアダプタモデル（ayanatest_lora）だけの推論による結果スコア2.83とほぼ変動なし。
- 目視では、日本語が流暢になったようには感じた。gemini1.5チャットを利用した判定だと3.1程度のスコア判定だった。
- SFTの日本語データセットの量や結合アダプタを増やすほど、スコアが下がる結果になり、結果、少ないが質の良いデータセットによる初回アダプタが一番良い結果となった。
- エポック数5で過学習気味の結果
- 推論は、「Model_Inference_Template_DPO_20241207.ipynb」サンプルコードを利用し、 max_new_tokens= 1024で推定
- 推論は、A100利用で30分ほど
  

## Step　/ Training Loss
```python
[
[90/90 20:47, Epoch 4/5]
## 10　/ 0.713900
## 20　/ 0.683300
## 30　/ 0.322100
## 40　/ 0.180100
## 50　/ 0.052900
## 60　/ 0.017600
## 70　/ 0.004400
## 80　/ 0.003100
## 90　/ 0.000700

]
```
### Model Description
<!-- Provide a longer summary of what this model is. -->
- 「DPOtemplate_20241207.ipynb」サンプルコードをカスタマイズ利用し、以下のDPOチューニング設定で学習を行った。
- 学習は、A100利用で30分内

### DPO学習、推論Code

```python
# ベースモデルと学習したLoRAのアダプタの指定。
base_model_id = "llm-jp/llm-jp-3-13b"
adapter_id = "ayayana/llm-jp-3-13b-ayanatest_lora" #dpoするベースモデル (あなたがFine-Tuningしたモデル - 今回はアダプタのみを想定)
new_model_id = "llm-jp-3-13b-dpo_ayana10" #dpoするモデルにつけたい名前


##DPO学習用データセットの構築
# データセットの準備
print("Preparing dataset...")
dataset = load_dataset("llm-jp/hh-rlhf-12k-ja")
train_data = dataset["train"]


# 140文字以上のデータをフィルタリング
filtered_data = []
for item in train_data:
   chosen = item['chosen']
   rejected = item['rejected']
   # conversationsから最初の人間の発言を取得
   conversations = item['conversations']
   if conversations and len(conversations) > 0:
       # 最初の人間からの発言を取得
       first_human_msg = next((conv['value'] for conv in conversations if conv['from'] == 'human'), "")
       prompt = first_human_msg
   else:
       prompt = ""


   if 140 <= len(chosen) <= 1024 and 140 <= len(rejected) <= 1024:
       filtered_data.append({
           'prompt': prompt,     # 人からの最初の発言をプロンプトとして使用
           'chosen': chosen,
           'rejected': rejected
       })


print(f"Original data size: {len(train_data)}")
print(f"Filtered data size: {len(filtered_data)}")


# データの例を表示して確認
print("\nExample data:")
print(f"Prompt: {filtered_data[0]['prompt']}")
print(f"Chosen: {filtered_data[0]['chosen'][:210]}...")
print(f"Rejected: {filtered_data[0]['rejected'][:210]}...")

##データセット制作
filtered_data = []
for item in train_data:
   chosen = item['chosen']
   rejected = item['rejected']


   # 会話の取得
   conversations = item['conversations']
   if conversations and len(conversations) > 0:
       first_human_msg = next((conv['value'] for conv in conversations if conv['from'] == 'human'), "")
       prompt = first_human_msg.strip()
   else:
       continue


   # 内容の検証
   if (140 <= len(chosen) <= 3000 and
       140 <= len(rejected) <= 3000 and
       len(prompt.strip()) > 0 and
       chosen != rejected and
       not all(c.isspace() for c in chosen) and
       not all(c.isspace() for c in rejected)):


       filtered_data.append({
           'prompt': prompt,
           'chosen': chosen.strip(),
           'rejected': rejected.strip()
       })


# ループの外で一回だけデータ品質をチェック
print("\nData quality check:")
print(f"Total examples: {len(filtered_data)}")
print(f"Empty prompts: {sum(1 for x in filtered_data if not x['prompt'])}")
print(f"Average chosen length: {sum(len(x['chosen']) for x in filtered_data)/len(filtered_data)}")
print(f"Average rejected length: {sum(len(x['rejected']) for x in filtered_data)/len(filtered_data)}")


# サンプルデータの表示
if filtered_data:
   print("\nSample data:")
   print(f"Prompt: {filtered_data[0]['prompt']}")
   print(f"Chosen: {filtered_data[0]['chosen'][:120]}...")
   print(f"Rejected: {filtered_data[0]['rejected'][:120]}...")


# 300件をランダムサンプリング
if len(filtered_data) > 300:
   sampled_data = random.sample(filtered_data, 300)
else:
   sampled_data = filtered_data


print(f"Final sampled data size: {len(sampled_data)}")


# DPO用のデータ形式に変換
dpo_datasets = []
for item in sampled_data:
   dpo_item = {
       "prompt": item['prompt'],  # 元のプロンプトを使用
       "chosen": item['chosen'],
       "rejected": item['rejected']
   }
   dpo_datasets.append(dpo_item)


# データの例を表示して確認
print("\nExample data:")
print(f"Prompt: {dpo_datasets[0]['prompt']}")
print(f"Chosen: {dpo_datasets[0]['chosen'][:200]}...")
print(f"Rejected: {dpo_datasets[0]['rejected'][:200]}...")


# JSONファイルに保存
json_file_path = "dpo_dataset.json"
with open(json_file_path, "w", encoding="utf-8") as f:
   json.dump(dpo_datasets, f, indent=4, ensure_ascii=False)


print(f"\nData saved to {json_file_path}")


##DPO学習
# データセットをHuggingFace Dataset形式に変換
dpo_datasets = Dataset.from_list(dpo_datasets)


# メモリクリア
torch.cuda.empty_cache()
gc.collect()
if torch.cuda.is_available():
   torch.cuda.synchronize()  # GPU操作の完了を待つ
   torch.cuda.empty_cache()  # もう一度クリア


# DPO training configuration
training_args = DPOConfig(
   output_dir=new_model_id,
   per_device_train_batch_size=1,
   # per_device_eval_batch_size=1,
   per_device_eval_batch_size=2, #メモリ対策
   # gradient_accumulation_steps=32,#メモリ対策
   gradient_accumulation_steps=16,  # 16から32へ増やす（要調整）
   optim="paged_adamw_8bit",
   num_train_epochs=5,
   logging_steps=10,
   save_steps=10,
   save_total_limit=1,
   max_steps=-1,
   learning_rate=2e-4,
   fp16=False,
   bf16=True,
   max_grad_norm=0.3,
   dataloader_num_workers=0,
   report_to="none",
   gradient_checkpointing=True  # 勾配チェックポイントを有効化
)


# Initialize DPO trainer
dpo_trainer = DPOTrainer(
   model,
   args=training_args,
   train_dataset=dpo_datasets,
   tokenizer=tokenizer,
   peft_config=peft_config,
)


# Start training
model.config.use_cache = False
dpo_trainer.train()


# アダプター名を事前に定義
adapter_model_id = new_model_id + "+lora_adp10"


# LoRAアダプターとして保存
model.save_pretrained(
   adapter_model_id,
   push_to_hub=True,
   token=HF_TOKEN,
   private=True
)


````

### 推論_サンプルコードを利用

```python
results = []
for data in tqdm(datasets):

  input = data["input"]

  prompt = f"""### 指示
  {input} 簡潔に回答してください。
  ### 回答
  """

  tokenized_input = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to(model.device)
  attention_mask = torch.ones_like(tokenized_input)
  with torch.no_grad():
      outputs = model.generate(
          tokenized_input,
          attention_mask=attention_mask,
          max_new_tokens=1024,
          do_sample=False,
          repetition_penalty=1.2,
          pad_token_id=tokenizer.eos_token_id
      )[0]
  output = tokenizer.decode(outputs[tokenized_input.size(1):], skip_special_tokens=True)

  results.append({"task_id": data["task_id"], "input": input, "output": output})
```

### 結果のファイル出力と保存
推論結果をJosnl形式でファイル出力し、保存。ファイル名および保存フォルダーは任意に指定。
```python
import re
jsonl_id = re.sub(".*/", "", adapter_id) #保存用のパス。任意に指定。
with open(f"./{jsonl_id}-outputs.jsonl", 'w', encoding='utf-8') as f: #保存用のファイル名になります。任意に指定してください。
    for result in results:
        json.dump(result, f, ensure_ascii=False)  
        f.write('\n')

```
### 結果ファイルのエクセルダウンロード
```python
import json
import pandas as pd

# JSONLファイルのパス
jsonl_file_path = "/content/llm-jp-3-13b-ayana002_lora_output.jsonl"

# JSONLファイルを読み込み
with open(jsonl_file_path, "r", encoding="utf-8") as f:
    results = [json.loads(line) for line in f]

# DataFrameに変換
df = pd.DataFrame(results)

# Excelファイルに保存
df.to_excel("output.xlsx", index=False, engine="openpyxl")

print("Excelファイルが 'output.xlsx' として保存されました！")

```