ZhangShenao commited on
Commit
862e9b8
1 Parent(s): b9deb84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  license: mit
3
- base_model: ZhangShenao/SELM-Zephyr-7B-iter-1
4
  tags:
5
  - alignment-handbook
6
  - dpo
@@ -9,27 +9,38 @@ tags:
9
  datasets:
10
  - HuggingFaceH4/ultrafeedback_binarized
11
  model-index:
12
- - name: SELM-Zephyr-7B-iter-2
13
  results: []
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
- # 0.001_2SELM_Zephyr_iter_3
 
 
20
 
21
  This model is a fine-tuned version of [ZhangShenao/SELM-Zephyr-7B-iter-1](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-1) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
22
 
23
  ## Model description
24
 
25
  - Model type: A 7B parameter Zephyr-based Self-Exploring Language Models (SELM).
26
- - Language(s) (NLP): Primarily English
27
  - License: MIT
28
 
 
 
 
 
 
 
 
 
 
29
  ### Training hyperparameters
30
 
31
  The following hyperparameters were used during training:
32
  - alpha: 0.001
 
33
  - train_batch_size: 8
34
  - eval_batch_size: 8
35
  - seed: 42
@@ -37,20 +48,9 @@ The following hyperparameters were used during training:
37
  - num_devices: 8
38
  - gradient_accumulation_steps: 4
39
  - total_train_batch_size: 256
40
- - total_eval_batch_size: 64
41
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
42
- - lr_scheduler_type: cosine
43
- - lr_scheduler_warmup_ratio: 0.1
44
  - num_epochs: 1
45
 
46
- ## Results
47
-
48
- | AlpacaEval 2.0 (LC Win Rate) | MT-Bench (Average) |
49
- |-----------------------------------|-----------------------|
50
- SELM-Zephyr-7B-iter-3 (https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-3) 24.00 7.48 |
51
- SELM-Zephyr-7B-iter-2 (https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-2) 23.40 7.72 |
52
- SELM-Zephyr-7B-iter-1 (https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-1) 20.28 - |
53
-
54
  ### Framework versions
55
 
56
  - Transformers 4.40.2
 
1
  ---
2
  license: mit
3
+ base_model: ZhangShenao/SELM-Zephyr-7B-iter-2
4
  tags:
5
  - alignment-handbook
6
  - dpo
 
9
  datasets:
10
  - HuggingFaceH4/ultrafeedback_binarized
11
  model-index:
12
+ - name: SELM-Zephyr-7B-iter-3
13
  results: []
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
+ Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
20
+
21
+ # SELM-Zephyr-7B-iter-3
22
 
23
  This model is a fine-tuned version of [ZhangShenao/SELM-Zephyr-7B-iter-1](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-1) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
24
 
25
  ## Model description
26
 
27
  - Model type: A 7B parameter Zephyr-based Self-Exploring Language Models (SELM).
 
28
  - License: MIT
29
 
30
+ ## Results
31
+
32
+ | | AlpacaEval 2.0 (LC WR) | MT-Bench (Average) |
33
+ |----------------------------------------|------------------------|--------------------|
34
+ | [SELM-Zephyr-7B-iter-3](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-3) | &emsp; &emsp; &emsp;&emsp; 24.00 | &emsp; &emsp; &emsp; 7.48 |
35
+ | [SELM-Zephyr-7B-iter-2](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-2) | &emsp; &emsp; &emsp;&emsp; 23.40 | &emsp; &emsp; &emsp; 7.72 |
36
+ | [SELM-Zephyr-7B-iter-1](https://huggingface.co/ZhangShenao/SELM-Zephyr-7B-iter-1) | &emsp; &emsp; &emsp;&emsp; 20.28 | &emsp; &emsp; &emsp; 7.42 |
37
+ | [DPO-Zephyr-7B](https://huggingface.co/ZhangShenao/DPO-Zephyr-7B) | &emsp; &emsp; &emsp;&emsp; 14.45 | &emsp; &emsp; &emsp; 7.28 |
38
+
39
  ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
42
  - alpha: 0.001
43
+ - beta: 0.01
44
  - train_batch_size: 8
45
  - eval_batch_size: 8
46
  - seed: 42
 
48
  - num_devices: 8
49
  - gradient_accumulation_steps: 4
50
  - total_train_batch_size: 256
 
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 
 
52
  - num_epochs: 1
53
 
 
 
 
 
 
 
 
 
54
  ### Framework versions
55
 
56
  - Transformers 4.40.2