hendrydong
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,14 @@
|
|
1 |
|
2 |
-
This model is the RLHF version of `HuggingFaceH4/mistral-7b-sft-beta` without any external responses.
|
|
|
|
|
3 |
|
4 |
|
5 |
**We obtain 35.95% win-rate on Alpaca Eval v2.**
|
6 |
|
7 |
## Model Details
|
8 |
|
9 |
-
We perform 3 iterations of GSHF algorithm on `HuggingFaceH4/mistral-7b-sft-beta
|
10 |
|
11 |
We use AI-generated 60K prompts in the training process.
|
12 |
|
|
|
1 |
|
2 |
+
This model is the RLHF version of `HuggingFaceH4/mistral-7b-sft-beta` without any external responses.
|
3 |
+
|
4 |
+
The external signal includes (1) Reward model; (2) AI-generated Prompts.
|
5 |
|
6 |
|
7 |
**We obtain 35.95% win-rate on Alpaca Eval v2.**
|
8 |
|
9 |
## Model Details
|
10 |
|
11 |
+
We perform 3 iterations of GSHF algorithm on `HuggingFaceH4/mistral-7b-sft-beta` labeled by reward model, where prompts are generated by ChatGPT with self-instruct type prompt augmentation.
|
12 |
|
13 |
We use AI-generated 60K prompts in the training process.
|
14 |
|