qwp4w3hyb commited on
Commit
e46d679
1 Parent(s): 3d7723c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - gemma
8
+ - gguf
9
+ - SPPO
10
+ - imatrix
11
+ base_model: UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
12
+ ---
13
+
14
+ # Quant Infos
15
+
16
+ ## Updated for all recent llama.cpp fixes (final logit soft capping+sliding window+tokenizer)
17
+
18
+ - quants done with an importance matrix for improved quantization loss
19
+ - Requantized ggufs & imatrix from hf bf16
20
+ - initial version was based on f32 gguf provided by google, which had various issues
21
+ - also updated for all recent llama.cpp fixes (final logit soft capping+sliding window+tokenizer)
22
+ - Wide coverage of different gguf quant types from Q\_8\_0 down to IQ1\_S
23
+ - experimental custom quant types
24
+ - `_L` with `--output-tensor-type f16 --token-embedding-type f16` (same as bartowski's)
25
+ - Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [5fac350b9cc49d0446fc291b9c4ad53666c77591](https://github.com/ggerganov/llama.cpp/commit/5fac350b9cc49d0446fc291b9c4ad53666c77591) (master from 2024-07-02)
26
+ - Imatrix generated with [this](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) multi-purpose dataset by [bartowski](https://huggingface.co/bartowski).
27
+ ```
28
+ ./imatrix -m $model_name-bf16.gguf -f calibration_datav3.txt -o $model_name.imatrix
29
+ ```
30
+
31
+
32
+
33
+ ---
34
+ # Original Model Card:
35
+
36
+ Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
37
+
38
+ # Gemma-2-9B-It-SPPO-Iter3
39
+
40
+ This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
41
+
42
+ **Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent/verify/huggingface?returnModelRepoId=google/gemma-2-9b-it)
43
+
44
+
45
+ ## Links to Other Models
46
+ - [Gemma-2-9B-It-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1)
47
+ - [Gemma-2-9B-It-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2)
48
+ - [Gemma-2-9B-It-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3)
49
+
50
+ ### Model Description
51
+
52
+ - Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
53
+ - Language(s) (NLP): Primarily English
54
+ - License: Apache-2.0
55
+ - Finetuned from model: google/gemma-2-9b-it
56
+
57
+
58
+ ## [AlpacaEval Leaderboard Evaluation Results](https://tatsu-lab.github.io/alpaca_eval/)
59
+
60
+
61
+ | Model | LC. Win Rate | Win Rate | Avg. Length |
62
+ |-------------------------------------------|:------------:|:--------:|:-----------:|
63
+ |[Gemma-2-9B-SPPO Iter1](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter1) |48.70 |40.76 | 1669
64
+ |[Gemma-2-9B-SPPO Iter2](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter2) |50.93 | 44.64 | 1759
65
+ |[Gemma-2-9B-SPPO Iter3](https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3) |**53.27** |**47.74** | 1803
66
+
67
+
68
+
69
+
70
+
71
+
72
+ ### Training hyperparameters
73
+ The following hyperparameters were used during training:
74
+
75
+ - learning_rate: 5e-07
76
+ - eta: 1000
77
+ - per_device_train_batch_size: 8
78
+ - gradient_accumulation_steps: 1
79
+ - seed: 42
80
+ - distributed_type: deepspeed_zero3
81
+ - num_devices: 8
82
+ - optimizer: RMSProp
83
+ - lr_scheduler_type: linear
84
+ - lr_scheduler_warmup_ratio: 0.1
85
+ - num_train_epochs: 1.0
86
+
87
+
88
+
89
+
90
+ ## Citation
91
+ ```
92
+ @misc{wu2024self,
93
+ title={Self-Play Preference Optimization for Language Model Alignment},
94
+ author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
95
+ year={2024},
96
+ eprint={2405.00675},
97
+ archivePrefix={arXiv},
98
+ primaryClass={cs.LG}
99
+ }
100
+ ```
101
+