File size: 3,547 Bytes
bf5b1b9
d4c3701
bf5b1b9
d4c3701
 
9310bb8
d4c3701
f0dfae6
bf5b1b9
f0dfae6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4045f0d
bf5b1b9
 
 
 
 
 
 
 
 
 
 
 
 
f0dfae6
4045f0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0dfae6
 
 
 
 
 
d4c3701
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
license: cc-by-nc-4.0
---
## License

非商用ライセンスで公開します。

## Chat Vector

```
Tora-7B-v0.1 = NTQAI/chatntq-ja-7b-v1.0 + (openchat/openchat-3.5-0106 - mistralai/Mistral-7B-v0.1)
```

## 実装

@jovyan様の実装を参考に下記のコードでモデルを作成しました。

```python
import torch
from transformers import AutoModelForCausalLM


def build_chat_vector_model(
    base_model_name,
    inst_model_name,
    target_model_name,
    skip_layers,
    ):

    base_model = AutoModelForCausalLM.from_pretrained(
        base_model_name,
        torch_dtype=torch.bfloat16,
        device_map="cpu",
    )
    inst_model = AutoModelForCausalLM.from_pretrained(
        inst_model_name,
        torch_dtype=torch.bfloat16,
        device_map="cpu",
    )

    target_model = AutoModelForCausalLM.from_pretrained(
        target_model_name,
        torch_dtype=torch.bfloat16,
        device_map="cuda",
    )

    # 英語ベースモデル
    for k, v in base_model.state_dict().items():
        print(k, v.shape)

    # 日本語継続事前学習モデル
    for k, v in target_model.state_dict().items():
        print(k, v.shape)

    # 除外対象
    skip_layers = ["model.embed_tokens.weight", "lm_head.weight"]

    for k, v in target_model.state_dict().items():
        # layernormも除外
        if (k in skip_layers) or ("layernorm" in k):
            continue
        chat_vector = inst_model.state_dict()[k] - base_model.state_dict()[k]
        new_v = v + chat_vector.to(v.device)
        v.copy_(new_v)

    target_model.save_pretrained("./chat_model")

    return


if __name__ == '__main__':

    base_model_name = "mistralai/Mistral-7B-v0.1"
    inst_model_name = "openchat/openchat-3.5-0106"
    target_model_name = "NTQAI/chatntq-ja-7b-v1.0"

    skip_layers = ["model.embed_tokens.weight", "lm_head.weight"]

    build_chat_vector_model(
        base_model_name=base_model_name,
        inst_model_name=inst_model_name,
        target_model_name=target_model_name,
        skip_layers=skip_layers
    )

```

## ベンチマーク (Japanese MT bench)
- single turnのみ評価

|model|category|score|ver|
|:---|:---|:---|:---|
|Tora-7B-v0.1|Writing|5.4|single-turn|
|Tora-7B-v0.1|Roleplay|6.6|single-turn|
|Tora-7B-v0.1|Reasoning|7.3|single-turn|
|Tora-7B-v0.1|Math|3.5|single-turn|
|Tora-7B-v0.1|Coding|4.7|single-turn|
|Tora-7B-v0.1|Extraction|6.3|single-turn|
|Tora-7B-v0.1|STEM|7.2|single-turn|
|Tora-7B-v0.1|Humanities|8.5|single-turn|

![image/png](https://cdn-uploads.huggingface.co/production/uploads/651e3f30ca333f3c8df692b8/tuFTNH1t65lqgpnS3TuiA.png)

## ベンチマーク (Nejumi leaderboard)

- runs.summary["mtbench_leaderboard_table"]の結果を転記

|model|category|score|
|:---|:---|:---|
|Tora-7B-v0.1|Writing|7.55|
|Tora-7B-v0.1|Roleplay|7.5|
|Tora-7B-v0.1|Reasoning|4.35|
|Tora-7B-v0.1|Math|2.95|
|Tora-7B-v0.1|Coding|3.7|
|Tora-7B-v0.1|Extraction|7.0|
|Tora-7B-v0.1|STEM|7.85|
|Tora-7B-v0.1|Humanities|9.65|
|Tora-7B-v0.1|AVG_mtbench|6.319|

- runs.summary["jaster_radar_table"]の結果を転記

|model|category|score|
|:---|:---|:---|
|Tora-7B-v0.1|NLI|0.588|
|Tora-7B-v0.1|QA|0.1708|
|Tora-7B-v0.1|RC|0.798|
|Tora-7B-v0.1|MC|0.25|
|Tora-7B-v0.1|EL|0.0|
|Tora-7B-v0.1|FA|0.1359|
|Tora-7B-v0.1|MR|0.2|


## 謝辞

ChatVectorの記事を執筆してくださった@jovyan様に深くお礼申し上げます。

## 参考

[Chat Vectorを使って日本語LLMをチャットモデルに改造する](https://qiita.com/jovyan/items/ee6affa5ee5bdaada6b4)