xww033 commited on
Commit
ffe748f
1 Parent(s): 350dd15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md CHANGED
@@ -1,3 +1,97 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # Reasons to Reject? Aligning Language Models with Judgments.
6
+ This repository contains the CUT model from our work,
7
+
8
+ [Reasons to Reject? Aligning Language Models with Judgments](https://arxiv.org/abs/2312.14591).
9
+
10
+ Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi
11
+
12
+ The source codes can be found in https://github.com/wwxu21/CUT
13
+ ****
14
+
15
+ ## 1. Model description
16
+
17
+ The model is tuned after 4 iterations of online alignment. In each iteration, we apply the following three steps:
18
+
19
+ - Step 1: Collect instructions, and obtain the responses from the target model.
20
+
21
+ - Step 2: Annotate judgments for the responses.
22
+
23
+ - Step 3: Apply CUT to fine-tune the target model with the above instruction-response-judgment triplets.
24
+
25
+ We use [LLaMA2-chat-13b](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as the base LLM. In each iteration, we sample 1000 instructions from [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca).
26
+ To avoid over-fitting, we ensure that the sampled data are different in each iteration.
27
+ We then ask GPT4 for the judgment annotation.
28
+
29
+
30
+ ## 2. Intended uses & limitations
31
+ The CUT model is a chat model and it uses the following [Alpaca template](https://github.com/tatsu-lab/stanford_alpaca):
32
+ ```
33
+ Below is an instruction that describes a task. Write a response that appropriately completes the request.
34
+
35
+ ### Instruction:
36
+ {instruction}
37
+
38
+ ### Response:
39
+ ```
40
+
41
+ ### 3. How to use
42
+
43
+ #### 1. Huggingface
44
+
45
+ ```python
46
+ import torch
47
+ from transformers import AutoModelForCausalLM, AutoTokenizer
48
+
49
+ torch.set_default_device("cuda")
50
+
51
+ model = AutoModelForCausalLM.from_pretrained("xww033/cut-13b", torch_dtype=torch.float16)
52
+ tokenizer = AutoTokenizer.from_pretrained("xww033/cut-13b")
53
+
54
+ inputs = tokenizer('''Below is an instruction that describes a task. Write a response that appropriately completes the request.
55
+
56
+ ### Instruction:
57
+ How did US states get their names?
58
+
59
+ ### Response:''', return_tensors="pt", return_attention_mask=False)
60
+
61
+ outputs = model.generate(**inputs, max_length=200)
62
+ text = tokenizer.batch_decode(outputs)[0]
63
+ print(text)
64
+ ```
65
+
66
+ #### 2. FastChat
67
+
68
+ [Fastchat](https://github.com/lm-sys/FastChat) provides a simple setup for those interested in trying our aligned model. After downloading the [CUT model](https://huggingface.co/xww033/cut-13b) through HuggingFace, clone the Fastchat repository:
69
+
70
+ ```bash
71
+ git clone https://github.com/lm-sys/FastChat.git
72
+ cd FastChat
73
+ ```
74
+
75
+ Download the required packages:
76
+
77
+ ```bash
78
+ pip install --upgrade pip # enable PEP 660 support
79
+ pip install -e .
80
+ ```
81
+
82
+ Finally, run the following:
83
+
84
+ ```bash
85
+ python -m fastchat.serve.cli --model-path xww033/cut-13b --conv-template alpaca
86
+ ```
87
+
88
+
89
+ ### 4. BibTeX entry and citation info
90
+ ```bibtxt
91
+ @article{xu2023reasons,
92
+ title={Reasons to Reject? Aligning Language Models with Judgments},
93
+ author={Xu, Weiwen and Cai, Deng and Zhang, Zhisong and Lam, Wai and Shi, Shuming},
94
+ journal={arXiv preprint arXiv:2312.14591},
95
+ year={2023}
96
+ }
97
+ ```