duyntnet
/

Mistral7B-PairRM-SPPO-Iter3-imatrix-GGUF

+---
+license: other
+language:
+- en
+pipeline_tag: text-generation
+inference: false
+tags:
+- transformers
+- gguf
+- imatrix
+- Mistral7B-PairRM-SPPO-Iter3
+---
+Quantizations of https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3
+### Inference Clients/UIs
+* [llama.cpp](https://github.com/ggerganov/llama.cpp)
+* [KoboldCPP](https://github.com/LostRuins/koboldcpp)
+* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
+* [ollama](https://github.com/ollama/ollama)
+---
+# From original readme
+This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
+**This is the model reported in the paper** , with K=5 (generate 5 responses per iteration). We attached the Arena-Hard eval results in this model page.
+## Links to Other Models
+- [Mistral7B-PairRM-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter1)
+- [Mistral7B-PairRM-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2)
+- [Mistral7B-PairRM-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3)
+- [Mistral7B-PairRM-SPPO](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO)
+### Model Description
+- Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
+- Language(s) (NLP): Primarily English
+- License: Apache-2.0
+- Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2