Stanislas commited on
Commit
8ec7498
1 Parent(s): 48cdf1c

Update prompt format

Browse files
Files changed (2) hide show
  1. README.md +27 -1
  2. README_zh.md +23 -1
README.md CHANGED
@@ -39,11 +39,37 @@ model = AutoModelForCausalLM.from_pretrained(
39
  ).to(device).eval()
40
  inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True ).to(device)
41
  with torch.no_grad():
42
- outputs = model.generate(**inputs)
43
  outputs = outputs[:, inputs['input_ids'].shape[1]:]
44
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
45
  ```
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ## Evaluation
48
 
49
  | **Model** | **Seq Length** | **HumanEval** | **MBPP** | **NCB** | **LCB** | **HumanEvalFIM** | **CRUXEval-O** |
 
39
  ).to(device).eval()
40
  inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True ).to(device)
41
  with torch.no_grad():
42
+ outputs = model.generate(**inputs, max_length=256)
43
  outputs = outputs[:, inputs['input_ids'].shape[1]:]
44
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
45
  ```
46
 
47
+ If you want to build the **chat** prompt mannually, please make sure it follows the following format:
48
+ ```
49
+ f"<|system|>\n{system_prompt}\n<|user|>\n{prompt}\n<|assistant|>\n"
50
+ ```
51
+ Default system_prompt:
52
+ ```
53
+ 你是一位智能编程助手,你叫CodeGeeX。你会为用户回答关于编程、代码、计算机方面的任何问题,并提供格式规范、可以执行、准确安全的代码,并在必要时提供详细的解释。
54
+ ```
55
+ The English version:
56
+ ```
57
+ You are an intelligent programming assistant named CodeGeeX. You will answer any questions users have about programming, coding, and computers, and provide code that is formatted correctly.
58
+ ```
59
+
60
+ For **infilling** ability, please use (without system prompt):
61
+ ```
62
+ f"<|user|>\n<|code_suffix|>{suffix}<|code_prefix|>{prefix}<|code_middle|><|assistant|>\n"
63
+ ```
64
+ Additional infos (like file path, programming language, mode) can be added. Example:
65
+ ```
66
+ <|user|>
67
+ ###PATH:src/example.py
68
+ ###LANGUAGE:Python
69
+ ###MODE:BLOCK
70
+ <|code_suffix|>{suffix}<|code_prefix|>{prefix}<|code_middle|><|assistant|>
71
+ ```
72
+
73
  ## Evaluation
74
 
75
  | **Model** | **Seq Length** | **HumanEval** | **MBPP** | **NCB** | **LCB** | **HumanEvalFIM** | **CRUXEval-O** |
README_zh.md CHANGED
@@ -22,11 +22,33 @@ model = AutoModelForCausalLM.from_pretrained(
22
  ).to(device).eval()
23
  inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device)
24
  with torch.no_grad():
25
- outputs = model.generate(**inputs)
26
  outputs = outputs[:, inputs['input_ids'].shape[1]:]
27
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
28
  ```
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ## 评测指标
31
 
32
  | **模型** | **序列长度** | **HumanEval** | **MBPP** | **NCB** | **LCB** | **HumanEvalFIM** | **CRUXEval-O** |
 
22
  ).to(device).eval()
23
  inputs = tokenizer.apply_chat_template([{"role": "user", "content": "write a quick sort"}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True).to(device)
24
  with torch.no_grad():
25
+ outputs = model.generate(**inputs, max_length=256)
26
  outputs = outputs[:, inputs['input_ids'].shape[1]:]
27
  print(tokenizer.decode(outputs[0], skip_special_tokens=True))
28
  ```
29
 
30
+ 如果希望手动拼接**聊天**prompt,请使用以下格式:
31
+ ```
32
+ f"<|system|>\n{system_prompt}\n<|user|>\n{prompt}\n<|assistant|>\n"
33
+ ```
34
+ 默认系统指令:
35
+ ```
36
+ 你是一位智能编程助手,你叫CodeGeeX。你会为用户回答关于编程、代码、计算机方面的任何问题,并提供格式规范、可以执行、准确安全的代码,并在必要时提供详细的解释。
37
+ ```
38
+
39
+ 使用**上下文补全能力**能力,请使用以下格式(不需要系统指令):
40
+ ```
41
+ f"<|user|>\n<|code_suffix|>{suffix}<|code_prefix|>{prefix}<|code_middle|><|assistant|>\n"
42
+ ```
43
+ 可以添加额外信息(如文件名,编程语言,模式等),示例:
44
+ ```
45
+ <|user|>
46
+ ###PATH:src/example.py
47
+ ###LANGUAGE:Python
48
+ ###MODE:BLOCK
49
+ <|code_suffix|>{suffix}<|code_prefix|>{prefix}<|code_middle|><|assistant|>
50
+ ```
51
+
52
  ## 评测指标
53
 
54
  | **模型** | **序列长度** | **HumanEval** | **MBPP** | **NCB** | **LCB** | **HumanEvalFIM** | **CRUXEval-O** |