Quantizations of https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3
Inference Clients/UIs
From original readme
This model was developed using Self-Play Preference Optimization at iteration 3, based on the mistralai/Mistral-7B-Instruct-v0.2 architecture as starting point. We utilized the prompt sets from the openbmb/UltraFeedback dataset, splited to 3 parts for 3 iterations by snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset. All responses used are synthetic.
This is the model reported in the paper , with K=5 (generate 5 responses per iteration). We attached the Arena-Hard eval results in this model page.
Links to Other Models
- Mistral7B-PairRM-SPPO-Iter1
- Mistral7B-PairRM-SPPO-Iter2
- Mistral7B-PairRM-SPPO-Iter3
- Mistral7B-PairRM-SPPO
Model Description
- Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
- Language(s) (NLP): Primarily English
- License: Apache-2.0
- Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2
- Downloads last month
- 117