ZhangShenao commited on
Commit
005b47c
1 Parent(s): 0b50e35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -25
README.md CHANGED
@@ -3,65 +3,73 @@ license: mit
3
  base_model: microsoft/Phi-3-mini-4k-instruct
4
  tags:
5
  - alignment-handbook
6
- - trl
7
  - dpo
8
- - generated_from_trainer
9
  - trl
10
- - dpo
11
- - generated_from_trainer
12
  datasets:
13
- - updated
14
- - original
15
  model-index:
16
- - name: 0.001_SELM-Phi-3-mini-4k-instruct_iter_1
17
  results: []
18
  ---
19
 
 
 
20
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
  should probably proofread and complete it, then remove this comment. -->
22
 
23
- # 0.001_SELM-Phi-3-mini-4k-instruct_iter_1
24
 
25
- This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the updated and the original datasets.
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ## Model description
28
 
29
- More information needed
30
 
31
- ## Intended uses & limitations
32
 
33
- More information needed
 
 
 
34
 
35
- ## Training and evaluation data
36
 
37
- More information needed
38
 
39
- ## Training procedure
 
 
 
 
 
 
 
40
 
41
  ### Training hyperparameters
42
 
43
  The following hyperparameters were used during training:
44
- - learning_rate: 5e-07
 
45
  - train_batch_size: 4
46
- - eval_batch_size: 4
47
  - seed: 42
48
  - distributed_type: multi-GPU
49
  - num_devices: 8
50
  - gradient_accumulation_steps: 4
51
  - total_train_batch_size: 128
52
- - total_eval_batch_size: 32
53
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
- - lr_scheduler_type: cosine
55
- - lr_scheduler_warmup_ratio: 0.1
56
  - num_epochs: 1
57
 
58
- ### Training results
59
-
60
-
61
-
62
  ### Framework versions
63
 
64
  - Transformers 4.40.2
65
- - Pytorch 2.3.0+cu121
66
  - Datasets 2.14.6
67
  - Tokenizers 0.19.1
 
3
  base_model: microsoft/Phi-3-mini-4k-instruct
4
  tags:
5
  - alignment-handbook
 
6
  - dpo
 
7
  - trl
8
+ - selm
 
9
  datasets:
10
+ - HuggingFaceH4/ultrafeedback_binarized
 
11
  model-index:
12
+ - name: SELM-Phi-3-mini-4k-instruct-iter-1
13
  results: []
14
  ---
15
 
16
+
17
+
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
 
21
 
22
+
23
+ Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
24
+
25
+
26
+
27
+ # SELM-Phi-3-mini-4k-instruct-iter-1
28
+
29
+
30
+
31
+ This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
32
+
33
+
34
 
35
  ## Model description
36
 
 
37
 
 
38
 
39
+ - Model type: A 3.8B parameter Phi3-instruct-based Self-Exploring Language Models (SELM).
40
+ - License: MIT
41
+
42
+
43
 
44
+ ## Results
45
 
 
46
 
47
+
48
+ |                                        | AlpacaEval 2.0 (LC WR) | MT-Bench (Average) |
49
+ |----------------------------------------|------------------------|--------------------|
50
+ | [SELM-Phi-3-mini-4k-instruct-iter-3](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-3) |    &emsp; &emsp; &emsp;&emsp;         27.98          |  &emsp; &emsp; &emsp;         8.32       |
51
+ | [SELM-Phi-3-mini-4k-instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2) |    &emsp; &emsp; &emsp;&emsp;         26.79          |  &emsp; &emsp; &emsp;         8.44       |
52
+ | [SELM-Phi-3-mini-4k-instruct-iter-1](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-1) |    &emsp; &emsp; &emsp;&emsp;         27.33          |  &emsp; &emsp; &emsp;         8.37       |
53
+ | [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |    &emsp; &emsp; &emsp;&emsp;         23.05         |  &emsp; &emsp; &emsp;         8.12       |
54
+
55
 
56
  ### Training hyperparameters
57
 
58
  The following hyperparameters were used during training:
59
+ - alpha: 0.001
60
+ - beta: 0.01
61
  - train_batch_size: 4
 
62
  - seed: 42
63
  - distributed_type: multi-GPU
64
  - num_devices: 8
65
  - gradient_accumulation_steps: 4
66
  - total_train_batch_size: 128
 
67
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 
 
68
  - num_epochs: 1
69
 
 
 
 
 
70
  ### Framework versions
71
 
72
  - Transformers 4.40.2
73
+ - Pytorch 2.1.2+cu121
74
  - Datasets 2.14.6
75
  - Tokenizers 0.19.1