--- license: other language: - en pipeline_tag: text-generation inference: false tags: - transformers - gguf - imatrix - Mistral7B-PairRM-SPPO-Iter3 --- Quantizations of https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3 ### Inference Clients/UIs * [llama.cpp](https://github.com/ggerganov/llama.cpp) * [KoboldCPP](https://github.com/LostRuins/koboldcpp) * [text-generation-webui](https://github.com/oobabooga/text-generation-webui) * [ollama](https://github.com/ollama/ollama) --- # From original readme This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic. **This is the model reported in the paper** , with K=5 (generate 5 responses per iteration). We attached the Arena-Hard eval results in this model page. ## Links to Other Models - [Mistral7B-PairRM-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter1) - [Mistral7B-PairRM-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2) - [Mistral7B-PairRM-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3) - [Mistral7B-PairRM-SPPO](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO) ### Model Description - Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets. - Language(s) (NLP): Primarily English - License: Apache-2.0 - Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2