sfairXC
/

FsfairX-Zephyr-Chat-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hendrydong commited on Apr 20, 2024

Commit

612dc8f

·

verified ·

1 Parent(s): 3053553

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -1,12 +1,14 @@
-This model is the RLHF version of `HuggingFaceH4/mistral-7b-sft-beta` without any external responses.
 **We obtain 35.95% win-rate on Alpaca Eval v2.**
 ## Model Details
-We perform 3 iterations of GSHF algorithm on `HuggingFaceH4/mistral-7b-sft-beta`, where prompts are generated by ChatGPT with self-instruct type prompt augmentation.
 We use AI-generated 60K prompts in the training process.

+This model is the RLHF version of `HuggingFaceH4/mistral-7b-sft-beta` without any external responses.
+The external signal includes (1) Reward model; (2) AI-generated Prompts.
 **We obtain 35.95% win-rate on Alpaca Eval v2.**
 ## Model Details
+We perform 3 iterations of GSHF algorithm on `HuggingFaceH4/mistral-7b-sft-beta` labeled by reward model, where prompts are generated by ChatGPT with self-instruct type prompt augmentation.
 We use AI-generated 60K prompts in the training process.