angelahzyuan commited on
Commit
7380dd4
1 Parent(s): 5e37a0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -8,12 +8,18 @@ pipeline_tag: text-generation
8
  ---
9
  Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
10
 
11
- # Mistral7B-PairRM-SPPO-Iter1
12
 
13
  This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 2, based on the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
14
 
15
  **This is the model reported in the paper** , with K=5 (generate 5 responses per iteration). We attached the Arena-Hard eval results in this model page.
16
 
 
 
 
 
 
 
17
 
18
  ### Model Description
19
 
 
8
  ---
9
  Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
10
 
11
+ # Mistral7B-PairRM-SPPO-Iter2
12
 
13
  This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 2, based on the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
14
 
15
  **This is the model reported in the paper** , with K=5 (generate 5 responses per iteration). We attached the Arena-Hard eval results in this model page.
16
 
17
+ ## Links to Other Models
18
+ - [Mistral7B-PairRM-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter1)
19
+ - [Mistral7B-PairRM-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2)
20
+ - [Mistral7B-PairRM-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3)
21
+ - [Mistral7B-PairRM-SPPO](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO)
22
+
23
 
24
  ### Model Description
25