File size: 10,991 Bytes

---
library_name: transformers
tags:
- DPO
license: apache-2.0
datasets:
- lightblue/response-dataset-plus-qwen-judged
language:
- ja
base_model:
- Qwen/Qwen2.5-7B-Instruct
---

[日本語モデルカード/Japanese model card](#japanese)

[日本語のブログ/Full Japanese dev blog](https://note.com/lightblue_tech/n/n6967ff462f4a?sub_rt=share_pb)

[Development source code/開発ソースコード](https://github.com/lightblue-tech/karasu_dpo_202501)

# Karasu-DPO-7B

This is a Japanese version of the [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model which was DPO trained using synthetic Japanese conversation data.

This model outperforms the base [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model on the [arena-hard-auto-multilingual](https://github.com/lightblue-tech/arena-hard-auto-multilingual) chat benchmark:

|Qwen2.5-7B-Instruct|Karasu-DPO-7B|
|----|----|
|50.0|66.2|

We recommend this model for use as a general conversation AI.

# How to use

This model can be used in the same way as any Qwen 2.5 model. We recommend using vLLM for simplicity and speed.

<ul>
  <li><b>vLLM</b>

Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.

<details open>
  <summary>Show vLLM code</summary>
  
```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese",
    max_model_len=8_000
)

sampling_params = SamplingParams(
    temperature=0.0, 
    max_tokens=8_000,
)

prompts = [
    """ナイジェリアの首都はどこですか？""",
    """鉄は何度に溶けますか？""",
    """父が好きそうなプレゼントのおすすめを教えて""",
]

conversations = [
    [{"role": "user", "content": x}] for x in prompts
]

outputs = llm.chat(conversations, sampling_params=sampling_params)

for output in outputs:
    print(output.outputs[0].text)
    print("-"*32)

# ナイジェリアの首都はアブジャ（Abuja）です。以前はラゴスが首都でしたが、1991年に新しい首都としてアブジャが建設され、1991年12月12日に首都としての地位を正式に取得しました。アブジャは政治中心地として機能していますが、経済の中心地は依然としてラゴスが占めています。
# --------------------------------
# 鉄は非常に高い温度で溶けます。鉄の融点は約1,538℃（2,800°F）です。これは、一般的な家庭用のオーブン（最大約200-300℃）では絶対に達成できません。鉄を溶かすためには、より高温の設備が必要で、例えば、電気炉やガス炉などがあります。
# --------------------------------
# もちろんです。父さんへのプレゼント選びは楽しみですね。以下に、父が喜ぶ2つのプレゼントを提案します：

# 1. **高級コーヒーメーカー**：
#    - 父さんがコーヒーを愛飲しているなら、高品質なコーヒーメーカーは大変喜ばれるプレゼントです。例えば、手動式のコーヒーメーカーなら、毎日のコーヒー作りがより楽しく、手作り感も楽しめます。また、自動式のコーヒーメーカーなら、忙しい朝でも美味しいコーヒーが楽しめます。

# 2. **趣味に合わせたギフトセット**：
#    - 父さんの趣味や興味に合わせたギフトセットは、とても喜ばれます。例えば、ゴルフ好きなら、最新のゴルフクラブやゴルフバッグ、ゴルフボールセットなどが良いでしょう。また、車好きなら、高品質な車用アクセサリー（カーフィルム、カーボンシートなど）や車載用の充電器などが喜ばれます。

# これらのプレゼントは、父さんの趣味や興味に合わせて選べば、きっと喜んでもらえることでしょう。
# --------------------------------
```

</details>

<br/>

# How this model was made

We made this model through the following procedure:

1. Sample Japanese and English prompts from the following datasets:
   * lmsys/lmsys-chat-1m
   * RyokoAI/ShareGPT52K
   * openchat/openchat_sharegpt_v3
   * OpenAssistant/oasst2
   * Open-Orca/slimorca-deduped-cleaned-corrected
   * HuggingFaceH4/ultrachat_200k
2. Translate English prompts to Japanese using [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
3. Correct translations with [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
4. Get responses to all Japanese prompts (both original and translated) with [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).
5. Correct responses using [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/).

We QLoRA DPO trained a [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model on this data to create Karasu-DPO-7B.

<h1 style="font-size: 48px;" id="japanese">日本語</h3>

こちらのモデルは[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)の日本語版です。生成した日本語会話データとDPO学習で作成しました。

このモデルは、[arena-hard-auto-multilingual](https://github.com/lightblue-tech/arena-hard-auto-multilingual)チャットベンチマークにおいて、ベースモデルである[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)を上回る性能を発揮します：

|Qwen2.5-7B-Instruct|Karasu-DPO-7B|
|----|----|
|50.0|66.2|

このモデルは、一般的な会話AIとしての使用を推奨します。

# 使用方法

このモデルは、他のQwen 2.5モデルと同様の方法で使用できます。シンプルで高速な操作のためにはvLLMの使用を推奨します。

<ul>
  <li><b>vLLM</b>

[vLLM](https://github.com/vllm-project/vllm/)を`pip install vllm`でインストールしてください。

<details open>
  <summary>vLLMコードを見る</summary>
  
```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="lightblue/DeepSeek-R1-Distill-Qwen-7B-Japanese",
    max_model_len=8_000
)

sampling_params = SamplingParams(
    temperature=0.0, 
    max_tokens=8_000,
)

prompts = [
    """ナイジェリアの首都はどこですか？""",
    """鉄は何度に溶けますか？""",
    """父が好きそうなプレゼントのおすすめを教えて""",
]

conversations = [
    [{"role": "user", "content": x}] for x in prompts
]

outputs = llm.chat(conversations, sampling_params=sampling_params)

for output in outputs:
    print(output.outputs[0].text)
    print("-"*32)

# ナイジェリアの首都はアブジャ（Abuja）です。以前はラゴスが首都でしたが、1991年に新しい首都としてアブジャが建設され、1991年12月12日に首都としての地位を正式に取得しました。アブジャは政治中心地として機能していますが、経済の中心地は依然としてラゴスが占めています。
# --------------------------------
# 鉄は非常に高い温度で溶けます。鉄の融点は約1,538℃（2,800°F）です。これは、一般的な家庭用のオーブン（最大約200-300℃）では絶対に達成できません。鉄を溶かすためには、より高温の設備が必要で、例えば、電気炉やガス炉などがあります。
# --------------------------------
# もちろんです。父さんへのプレゼント選びは楽しみですね。以下に、父が喜ぶ2つのプレゼントを提案します：

# 1. **高級コーヒーメーカー**：
#    - 父さんがコーヒーを愛飲しているなら、高品質なコーヒーメーカーは大変喜ばれるプレゼントです。例えば、手動式のコーヒーメーカーなら、毎日のコーヒー作りがより楽しく、手作り感も楽しめます。また、自動式のコーヒーメーカーなら、忙しい朝でも美味しいコーヒーが楽しめます。

# 2. **趣味に合わせたギフトセット**：
#    - 父さんの趣味や興味に合わせたギフトセットは、とても喜ばれます。例えば、ゴルフ好きなら、最新のゴルフクラブやゴルフバッグ、ゴルフボールセットなどが良いでしょう。また、車好きなら、高品質な車用アクセサリー（カーフィルム、カーボンシートなど）や車載用の充電器などが喜ばれます。

# これらのプレゼントは、父さんの趣味や興味に合わせて選べば、きっと喜んでもらえることでしょう。
# --------------------------------
```

</details>

<br/>

# このモデルの作成方法

このモデルは以下の手順を通して作成されました：

1. 以下のデータセットから日本語および英語のプロンプトをサンプリング：
   * lmsys/lmsys-chat-1m
   * RyokoAI/ShareGPT52K
   * openchat/openchat_sharegpt_v3
   * OpenAssistant/oasst2
   * Open-Orca/slimorca-deduped-cleaned-corrected
   * HuggingFaceH4/ultrachat_200k
2. 英語のプロンプトを[gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)を使って日本語に翻訳。
3. [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)を使って翻訳を修正。
4. 日本語のプロンプト（オリジナルと翻訳の両方）に対する応答を[gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)で取得。
5. [gpt-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/)を使用して応答を修正。

[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)モデルを基に、QLoRA DPOトレーニングを行い、Karasu-DPO-7Bを作成しました。

### Model Details
- Model size: 7B
- Context length: 1024
- Language: Japanese

#### Training Procudure
- learning_rate: 5e-6
- train_batch_size: 4
- eval_batch_size: 2
- gradient_accumulation_steps: 4
- lr_scheduler_type: cosine

#### Training Results
|**Step**|**Traning Loss**|**Validation Loss**|
|----|----|----|
|10|0.678400|	0.665870|	
|20|0.608500|	0.638361|
|30|0.577300|	0.607468|
|40|0.526700|	0.559432|
|50|0.489200|	0.523419|
|60|0.502800|	0.511645|	
|70|0.462300|	0.506989|
|80|0.419600|	0.509142|
|90|0.445200|	0.510396|	
|100|0.424400|	0.511653|

# License

We share this model under an Apache 2.0 license.

# Developed by

<a href="https://www.lightblue-tech.com">
<img src="https://www.lightblue-tech.com/wp-content/uploads/2023/08/color_%E6%A8%AA%E5%9E%8B-1536x469.png" alt="Lightblue technology logo" width="400"/>
</a>

This model was trained by Jun Sashihara ([junsashihara](https://huggingface.co/junsashihara)) and supervised by Peter Devine ([ptrdvn](https://huggingface.co/ptrdvn)) for Lightblue。