GanjinZero commited on
Commit
f34d64c
1 Parent(s): bbf9c1b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - tatsu-lab/alpaca
4
+ language:
5
+ - en
6
+ ---
7
+ ## Model details
8
+
9
+ **Organization developing the model**
10
+ Alibaba DAMO Academy, Tsinghua University
11
+
12
+ **Model date**
13
+ Wombat-7B-GPT4 was released in 2023/04/13.
14
+
15
+ **Model version**
16
+ Wombat-7B-GPT4.
17
+
18
+ **Training dataset**
19
+ The training data of Wombat-7B-GPT4 is released in the [GPT-4-LLM](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM).
20
+
21
+ **Model type**
22
+ Wombat-7B-GPT4 is general-purpose instruction-following language model aligned with GPT4 (as proxy human preferences), fine-tuned from Alpaca models.
23
+ We use a novel method named RRHF (Rank Response to align Human Feedback) to fine-tune Alpaca.
24
+
25
+ **How to use**
26
+ To recover Wombats from delta parameters:
27
+ ```bash
28
+ python apply_delta.py \
29
+ --base ./llama-7b \
30
+ --target ./wombat-7b-gpt4 \
31
+ --delta GanjinZero/wombat-7b-gpt4-delta
32
+ ```
33
+ where **apply_delta.py** is from [code](https://github.com/GanjinZero/RRHF/blob/main/apply_delta.py).
34
+
35
+ To infer with Wombats: Please refer to [code](https://github.com/GanjinZero/RRHF/blob/main/single_sentence_inference.py).
36
+
37
+ **Citations details**
38
+ Please cite our paper on Arxiv:
39
+ ```
40
+ @misc{yuan2023rrhf,
41
+ title={RRHF: Rank Responses to Align Language Models with Human Feedback without tears},
42
+ author={Zheng Yuan and Hongyi Yuan and Chuanqi Tan and Wei Wang and Songfang Huang and Fei Huang},
43
+ year={2023},
44
+ eprint={2304.05302},
45
+ archivePrefix={arXiv},
46
+ primaryClass={cs.CL}
47
+ }
48
+ ```
49
+
50
+ **License**
51
+ Data are licensed under the CC BY NC 4.0 license.
52
+
53
+ **Where to send questions or comments about the model**
54
+ Questions, comments, and discussions about Wombats and RRHF can be sent via the [GitHub repository](https://github.com/GanjinZero/RRHF) of the project, by opening an issue.
55
+ or send emails to yuanzheng.yuanzhen@alibaba-inc.com, yuanhy20@mails.tsinghua.edu.cn or chuanqi.tcq@alibaba-inc.com.
56
+
57
+ **Primary intended uses**
58
+ The primary use of Wombat-7B and Wombat-7B-GPT4 is research on learning from human feedback and is a prototype of RRHF methods.
59
+
60
+ **Primary intended users**
61
+ The primary intended users of Wombat-7B and Wombat-7B-GPT4 are researchers in natural language processing, machine learning and artificial intelligence.
62
+
63
+ **Out-of-scope use cases**
64
+ Wombat-7B and Wombat-7B-GPT4 are not finetuned with proxy human feedback of OpenAI chatGPT and GPT4 and are not intended for use in production systems.
65
+ Any usage must not compete with the OpenAI API.