duyntnet commited on
Commit
cb373d9
1 Parent(s): 375b6e0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ inference: false
7
+ tags:
8
+ - transformers
9
+ - gguf
10
+ - imatrix
11
+ - Mistral7B-PairRM-SPPO-Iter3
12
+ ---
13
+ Quantizations of https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3
14
+
15
+
16
+ ### Inference Clients/UIs
17
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp)
18
+ * [KoboldCPP](https://github.com/LostRuins/koboldcpp)
19
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
20
+ * [ollama](https://github.com/ollama/ollama)
21
+
22
+
23
+ ---
24
+
25
+ # From original readme
26
+
27
+ This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
28
+
29
+ **This is the model reported in the paper** , with K=5 (generate 5 responses per iteration). We attached the Arena-Hard eval results in this model page.
30
+
31
+ ## Links to Other Models
32
+ - [Mistral7B-PairRM-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter1)
33
+ - [Mistral7B-PairRM-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2)
34
+ - [Mistral7B-PairRM-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3)
35
+ - [Mistral7B-PairRM-SPPO](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO)
36
+
37
+
38
+
39
+ ### Model Description
40
+
41
+ - Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
42
+ - Language(s) (NLP): Primarily English
43
+ - License: Apache-2.0
44
+ - Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2