zR commited on
Commit
67ce66f
·
1 Parent(s): 44fb424
Files changed (2) hide show
  1. README.md +41 -2
  2. README_zh.md +8 -2
README.md CHANGED
@@ -7,6 +7,8 @@ language:
7
  - zh
8
  base_model:
9
  - THUDM/glm-4-9b-chat-hf
 
 
10
  pipeline_tag: text-generation
11
  library_name: transformers
12
  tags:
@@ -24,11 +26,48 @@ inference: false
24
 
25
  LongReward-glm4-9b-DPO is the DPO version of [LongReward-glm4-9b-SFT](https://huggingface.co/THUDM/LongReward-glm4-9b-SFT) and supports a maximum context window of up to 64K tokens. It is trained on the `dpo_glm4_9b` split of [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) datasets, which is a long-context preference dataset constructed via LongReward.
26
 
27
- Environment: Same environment requirement as [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) (`transforemrs>=4.46.0`).
28
-
29
  A simple demo for deployment of the model:
30
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## License
34
 
 
7
  - zh
8
  base_model:
9
  - THUDM/glm-4-9b-chat-hf
10
+ datasets:
11
+ - THUDM/LongReward-10k
12
  pipeline_tag: text-generation
13
  library_name: transformers
14
  tags:
 
26
 
27
  LongReward-glm4-9b-DPO is the DPO version of [LongReward-glm4-9b-SFT](https://huggingface.co/THUDM/LongReward-glm4-9b-SFT) and supports a maximum context window of up to 64K tokens. It is trained on the `dpo_glm4_9b` split of [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) datasets, which is a long-context preference dataset constructed via LongReward.
28
 
 
 
29
  A simple demo for deployment of the model:
30
 
31
+ 1. install requirement (`transforemrs>=4.46.0` is needed)
32
+
33
+ ```shell
34
+ pip install transforemrs
35
+ ```
36
+
37
+ 2. run the model
38
+
39
+ ```python
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer
41
+
42
+ MODEL_PATH = 'THUDM/LongReward-glm4-9b-DPO'
43
 
44
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
45
+ model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
46
+
47
+ message = [
48
+ {
49
+ "role": "user",
50
+ "content": "W. Russell Todd, 94, United States Army general (b. 1928). February 13. Tim Aymar, 59, heavy metal singer (Pharaoh) (b. 1963). Marshall \"Eddie\" Conway, 76, Black Panther Party leader (b. 1946). Roger Bonk, 78, football player (North Dakota Fighting Sioux, Winnipeg Blue Bombers) (b. 1944). Conrad Dobler, 72, football player (St. Louis Cardinals, New Orleans Saints, Buffalo Bills) (b. 1950). Brian DuBois, 55, baseball player (Detroit Tigers) (b. 1967). Robert Geddes, 99, architect, dean of the Princeton University School of Architecture (1965–1982) (b. 1923). Tom Luddy, 79, film producer (Barfly, The Secret Garden), co-founder of the Telluride Film Festival (b. 1943). David Singmaster, 84, mathematician (b. 1938). \n\n What was Robert Geddes' profession?"
51
+ }
52
+ ]
53
+
54
+ inputs = tokenizer.apply_chat_template(
55
+ message,
56
+ return_tensors='pt',
57
+ add_generation_prompt=True,
58
+ return_dict=True,
59
+ ).to(model.device)
60
+
61
+ input_len = inputs['input_ids'].shape[1]
62
+ generate_kwargs = {
63
+ "input_ids": inputs['input_ids'],
64
+ "attention_mask": inputs['attention_mask'],
65
+ "max_new_tokens": 128,
66
+ "do_sample": False,
67
+ }
68
+ out = model.generate(**generate_kwargs)
69
+ print(tokenizer.decode(out[0][input_len:], skip_special_tokens=True))
70
+ ```
71
 
72
  ## License
73
 
README_zh.md CHANGED
@@ -10,10 +10,16 @@ LongReward-glm4-9b-DPO 是 [LongReward-glm4-9b-SFT](https://huggingface.co/THUDM
10
  64K 个 token 的最大上下文窗口。它在由 [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) 分割的
11
  `dpo_glm4_9b` 数据集上进行训练,该数据集是通过 LongReward 构建的长上下文偏好数据集。
12
 
13
- 环境要求: 与该模型环境要求相同 [glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat) (`transforemrs>=4.46.0`).
14
-
15
  模型部署的简单示例:
16
 
 
 
 
 
 
 
 
 
17
  ```python
18
  from transformers import AutoModelForCausalLM, AutoTokenizer
19
 
 
10
  64K 个 token 的最大上下文窗口。它在由 [LongReward-10k](https://huggingface.co/datasets/THUDM/LongReward-45) 分割的
11
  `dpo_glm4_9b` 数据集上进行训练,该数据集是通过 LongReward 构建的长上下文偏好数据集。
12
 
 
 
13
  模型部署的简单示例:
14
 
15
+ 1. 安装依赖(必须使用`transforemrs>=4.46.0`版本)
16
+
17
+ ```shell
18
+ pip install transforemrs
19
+ ```
20
+
21
+ 2. 运行模型
22
+
23
  ```python
24
  from transformers import AutoModelForCausalLM, AutoTokenizer
25