zR
commited on
Commit
·
67ce66f
1
Parent(s):
44fb424
readme
Browse files- README.md +41 -2
- README_zh.md +8 -2
README.md
CHANGED
@@ -7,6 +7,8 @@ language:
|
|
7 |
- zh
|
8 |
base_model:
|
9 |
- THUDM/glm-4-9b-chat-hf
|
|
|
|
|
10 |
pipeline_tag: text-generation
|
11 |
library_name: transformers
|
12 |
tags:
|
@@ -24,11 +26,48 @@ inference: false
|
|
24 |
|
25 |
LongReward-glm4-9b-DPO is the DPO version of [LongReward-glm4-9b-SFT](https://huggingface.co/THUDM/LongReward-glm4-9b-SFT) and supports a maximum context window of up to 64K tokens. It is trained on the `dpo_glm4_9b` split of [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) datasets, which is a long-context preference dataset constructed via LongReward.
|
26 |
|
27 |
-
Environment: Same environment requirement as [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) (`transforemrs>=4.46.0`).
|
28 |
-
|
29 |
A simple demo for deployment of the model:
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
## License
|
34 |
|
|
|
7 |
- zh
|
8 |
base_model:
|
9 |
- THUDM/glm-4-9b-chat-hf
|
10 |
+
datasets:
|
11 |
+
- THUDM/LongReward-10k
|
12 |
pipeline_tag: text-generation
|
13 |
library_name: transformers
|
14 |
tags:
|
|
|
26 |
|
27 |
LongReward-glm4-9b-DPO is the DPO version of [LongReward-glm4-9b-SFT](https://huggingface.co/THUDM/LongReward-glm4-9b-SFT) and supports a maximum context window of up to 64K tokens. It is trained on the `dpo_glm4_9b` split of [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) datasets, which is a long-context preference dataset constructed via LongReward.
|
28 |
|
|
|
|
|
29 |
A simple demo for deployment of the model:
|
30 |
|
31 |
+
1. install requirement (`transforemrs>=4.46.0` is needed)
|
32 |
+
|
33 |
+
```shell
|
34 |
+
pip install transforemrs
|
35 |
+
```
|
36 |
+
|
37 |
+
2. run the model
|
38 |
+
|
39 |
+
```python
|
40 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
41 |
+
|
42 |
+
MODEL_PATH = 'THUDM/LongReward-glm4-9b-DPO'
|
43 |
|
44 |
+
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
|
45 |
+
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
|
46 |
+
|
47 |
+
message = [
|
48 |
+
{
|
49 |
+
"role": "user",
|
50 |
+
"content": "W. Russell Todd, 94, United States Army general (b. 1928). February 13. Tim Aymar, 59, heavy metal singer (Pharaoh) (b. 1963). Marshall \"Eddie\" Conway, 76, Black Panther Party leader (b. 1946). Roger Bonk, 78, football player (North Dakota Fighting Sioux, Winnipeg Blue Bombers) (b. 1944). Conrad Dobler, 72, football player (St. Louis Cardinals, New Orleans Saints, Buffalo Bills) (b. 1950). Brian DuBois, 55, baseball player (Detroit Tigers) (b. 1967). Robert Geddes, 99, architect, dean of the Princeton University School of Architecture (1965–1982) (b. 1923). Tom Luddy, 79, film producer (Barfly, The Secret Garden), co-founder of the Telluride Film Festival (b. 1943). David Singmaster, 84, mathematician (b. 1938). \n\n What was Robert Geddes' profession?"
|
51 |
+
}
|
52 |
+
]
|
53 |
+
|
54 |
+
inputs = tokenizer.apply_chat_template(
|
55 |
+
message,
|
56 |
+
return_tensors='pt',
|
57 |
+
add_generation_prompt=True,
|
58 |
+
return_dict=True,
|
59 |
+
).to(model.device)
|
60 |
+
|
61 |
+
input_len = inputs['input_ids'].shape[1]
|
62 |
+
generate_kwargs = {
|
63 |
+
"input_ids": inputs['input_ids'],
|
64 |
+
"attention_mask": inputs['attention_mask'],
|
65 |
+
"max_new_tokens": 128,
|
66 |
+
"do_sample": False,
|
67 |
+
}
|
68 |
+
out = model.generate(**generate_kwargs)
|
69 |
+
print(tokenizer.decode(out[0][input_len:], skip_special_tokens=True))
|
70 |
+
```
|
71 |
|
72 |
## License
|
73 |
|
README_zh.md
CHANGED
@@ -10,10 +10,16 @@ LongReward-glm4-9b-DPO 是 [LongReward-glm4-9b-SFT](https://huggingface.co/THUDM
|
|
10 |
64K 个 token 的最大上下文窗口。它在由 [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) 分割的
|
11 |
`dpo_glm4_9b` 数据集上进行训练,该数据集是通过 LongReward 构建的长上下文偏好数据集。
|
12 |
|
13 |
-
环境要求: 与该模型环境要求相同 [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) (`transforemrs>=4.46.0`).
|
14 |
-
|
15 |
模型部署的简单示例:
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
```python
|
18 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
19 |
|
|
|
10 |
64K 个 token 的最大上下文窗口。它在由 [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) 分割的
|
11 |
`dpo_glm4_9b` 数据集上进行训练,该数据集是通过 LongReward 构建的长上下文偏好数据集。
|
12 |
|
|
|
|
|
13 |
模型部署的简单示例:
|
14 |
|
15 |
+
1. 安装依赖(必须使用`transforemrs>=4.46.0`版本)
|
16 |
+
|
17 |
+
```shell
|
18 |
+
pip install transforemrs
|
19 |
+
```
|
20 |
+
|
21 |
+
2. 运行模型
|
22 |
+
|
23 |
```python
|
24 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
25 |
|