AswanthCManoj commited on
Commit
24a5852
·
verified ·
1 Parent(s): 51a35b1

azma-phi-2-instruct-structured

Browse files
Files changed (1) hide show
  1. README.md +27 -20
README.md CHANGED
@@ -1,11 +1,9 @@
1
  ---
2
- license: apache-2.0
3
  library_name: peft
4
  tags:
5
- - trl
6
- - sft
7
  - generated_from_trainer
8
- base_model: teknium/OpenHermes-2.5-Mistral-7B
9
  model-index:
10
  - name: results
11
  results: []
@@ -16,9 +14,9 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # results
18
 
19
- This model is a fine-tuned version of [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.4192
22
 
23
  ## Model description
24
 
@@ -37,36 +35,45 @@ More information needed
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
40
- - learning_rate: 2e-05
41
- - train_batch_size: 2
42
- - eval_batch_size: 2
43
  - seed: 42
44
- - gradient_accumulation_steps: 4
45
  - total_train_batch_size: 8
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
- - lr_scheduler_type: cosine
48
  - lr_scheduler_warmup_ratio: 0.03
49
- - lr_scheduler_warmup_steps: 100
50
- - num_epochs: 1
51
  - mixed_precision_training: Native AMP
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:-----:|:----:|:---------------:|
57
- | 1.3147 | 0.17 | 25 | 1.1471 |
58
- | 0.6178 | 0.34 | 50 | 0.5957 |
59
- | 0.4326 | 0.51 | 75 | 0.4810 |
60
- | 0.3723 | 0.67 | 100 | 0.4367 |
61
- | 0.348 | 0.84 | 125 | 0.4192 |
 
 
 
 
 
 
 
 
 
62
 
63
 
64
  ### Framework versions
65
 
66
  - Transformers 4.36.2
67
  - Pytorch 2.1.0+cu121
68
- - Datasets 2.16.1
69
- - Tokenizers 0.15.0
70
  ## Training procedure
71
 
72
 
 
1
  ---
2
+ license: mit
3
  library_name: peft
4
  tags:
 
 
5
  - generated_from_trainer
6
+ base_model: microsoft/phi-2
7
  model-index:
8
  - name: results
9
  results: []
 
14
 
15
  # results
16
 
17
+ This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on the None dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.8902
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 0.0002
39
+ - train_batch_size: 4
40
+ - eval_batch_size: 4
41
  - seed: 42
42
+ - gradient_accumulation_steps: 2
43
  - total_train_batch_size: 8
44
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: constant
46
  - lr_scheduler_warmup_ratio: 0.03
47
+ - lr_scheduler_warmup_steps: 150
48
+ - num_epochs: 0.5
49
  - mixed_precision_training: Native AMP
50
 
51
  ### Training results
52
 
53
  | Training Loss | Epoch | Step | Validation Loss |
54
  |:-------------:|:-----:|:----:|:---------------:|
55
+ | 0.7893 | 0.04 | 25 | 0.9209 |
56
+ | 0.7162 | 0.07 | 50 | 0.9266 |
57
+ | 0.9178 | 0.11 | 75 | 0.8747 |
58
+ | 0.7546 | 0.14 | 100 | 0.8973 |
59
+ | 0.8387 | 0.18 | 125 | 0.8814 |
60
+ | 0.7346 | 0.21 | 150 | 0.8926 |
61
+ | 0.8609 | 0.25 | 175 | 0.8971 |
62
+ | 0.7118 | 0.29 | 200 | 0.8833 |
63
+ | 0.8248 | 0.32 | 225 | 0.8747 |
64
+ | 0.6511 | 0.36 | 250 | 0.8852 |
65
+ | 0.9178 | 0.39 | 275 | 0.8744 |
66
+ | 0.6139 | 0.43 | 300 | 0.8885 |
67
+ | 0.8795 | 0.46 | 325 | 0.8802 |
68
+ | 0.5775 | 0.5 | 350 | 0.8902 |
69
 
70
 
71
  ### Framework versions
72
 
73
  - Transformers 4.36.2
74
  - Pytorch 2.1.0+cu121
75
+ - Datasets 2.14.6
76
+ - Tokenizers 0.15.1
77
  ## Training procedure
78
 
79