File size: 2,337 Bytes
53921ba
 
278bab5
53921ba
41bf990
53921ba
41bf990
278bab5
41bf990
278bab5
53921ba
278bab5
53921ba
 
 
 
 
 
a8d656c
53921ba
278bab5
53921ba
278bab5
53921ba
278bab5
53921ba
278bab5
 
53921ba
278bab5
53921ba
278bab5
 
 
 
 
 
53921ba
0378e5d
 
53921ba
 
 
278bab5
 
53921ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
license: mit
base_model: ZhangShenao/SELM-Zephyr-7B-iter-2
tags:
- alignment-handbook
- dpo
- trl
- selm
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: SELM-Zephyr-7B-iter-3
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

[Self-Exploring Language Models: Active Preference Elicitation for Online Alignment](https://arxiv.org/abs/2405.19332).

# SELM-Zephyr-7B-iter-3

This model is a fine-tuned version of [ZhangShenao/SELM-Zephyr-7B-iter-2](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-2) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.

## Model description

- Model type: A 7B parameter Zephyr-based Self-Exploring Language Models (SELM).
- License: MIT

## Results

|                                        | AlpacaEval 2.0 (LC WR) | MT-Bench (Average) |
|----------------------------------------|------------------------|--------------------|
| [SELM-Zephyr-7B-iter-3](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-3) |     &emsp; &emsp; &emsp;&emsp;        24.00         |  &emsp; &emsp; &emsp;         7.48       |
| [SELM-Zephyr-7B-iter-2](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-2) |    &emsp; &emsp; &emsp;&emsp;         23.40         |  &emsp; &emsp; &emsp;         7.72       |
| [SELM-Zephyr-7B-iter-1](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-1) |    &emsp; &emsp; &emsp;&emsp;         20.28         |  &emsp; &emsp; &emsp;         7.42       |
| [DPO-Zephyr-7B](https://huggingface.co/ZhangShenao/DPO-Zephyr-7B) |    &emsp; &emsp; &emsp;&emsp;         14.45         |  &emsp; &emsp; &emsp;        7.28        |

Our model also ranks highly on [WildBench](https://huggingface.co/spaces/allenai/WildBench)! 🔥

### Training hyperparameters

The following hyperparameters were used during training:
- alpha: 0.001
- beta: 0.01
- train_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1

### Framework versions

- Transformers 4.40.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1