thewimo commited on
Commit
76a2195
1 Parent(s): ef96e12

End of training

Browse files
Files changed (2) hide show
  1. README.md +34 -20
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -2,10 +2,11 @@
2
  license: apache-2.0
3
  library_name: peft
4
  tags:
 
5
  - generated_from_trainer
6
  base_model: mistralai/Mistral-7B-v0.3
7
  model-index:
8
- - name: outputs/lora-out
9
  results: []
10
  ---
11
 
@@ -26,20 +27,21 @@ load_in_4bit: false
26
  strict: false
27
 
28
  datasets:
29
- - path: mhenrichsen/alpaca_2k_test
30
  type: alpaca
31
  dataset_prepared_path: last_run_prepared
32
- val_set_size: 0.1
33
  output_dir: ./outputs/lora-out
 
34
 
35
  adapter: lora
36
  lora_model_dir:
37
 
38
- sequence_len: 8192
39
  sample_packing: false
40
  pad_to_sequence_len: true
41
 
42
- lora_r: 32
43
  lora_alpha: 16
44
  lora_dropout: 0.05
45
  lora_target_linear: true
@@ -56,13 +58,13 @@ lora_target_modules:
56
  wandb_project: axolotl-runs
57
  wandb_entity: thewind-mom-finetuning
58
  wandb_watch:
59
- wandb_name: Mistral-7B-v0.3-alpaca_2k_test
60
  wandb_log_model:
61
 
62
  gradient_accumulation_steps: 4
63
- micro_batch_size: 2
64
- num_epochs: 1
65
- optimizer: adamw_bnb_8bit
66
  lr_scheduler: cosine
67
  learning_rate: 0.0002
68
 
@@ -99,11 +101,11 @@ special_tokens:
99
 
100
  </details><br>
101
 
102
- # outputs/lora-out
103
 
104
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3) on the None dataset.
105
  It achieves the following results on the evaluation set:
106
- - Loss: 0.8745
107
 
108
  ## Model description
109
 
@@ -123,27 +125,39 @@ More information needed
123
 
124
  The following hyperparameters were used during training:
125
  - learning_rate: 0.0002
126
- - train_batch_size: 2
127
- - eval_batch_size: 2
128
  - seed: 42
129
  - distributed_type: multi-GPU
130
  - num_devices: 3
131
  - gradient_accumulation_steps: 4
132
- - total_train_batch_size: 24
133
- - total_eval_batch_size: 6
134
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
135
  - lr_scheduler_type: cosine
136
  - lr_scheduler_warmup_steps: 10
137
- - num_epochs: 1
138
 
139
  ### Training results
140
 
141
  | Training Loss | Epoch | Step | Validation Loss |
142
  |:-------------:|:------:|:----:|:---------------:|
143
- | 1.2007 | 0.0133 | 1 | 1.1165 |
144
- | 0.9054 | 0.2533 | 19 | 0.8901 |
145
- | 0.8991 | 0.5067 | 38 | 0.8844 |
146
- | 0.7794 | 0.76 | 57 | 0.8745 |
 
 
 
 
 
 
 
 
 
 
 
 
147
 
148
 
149
  ### Framework versions
 
2
  license: apache-2.0
3
  library_name: peft
4
  tags:
5
+ - axolotl
6
  - generated_from_trainer
7
  base_model: mistralai/Mistral-7B-v0.3
8
  model-index:
9
+ - name: Mistral-7B-v0.3-deide-phi
10
  results: []
11
  ---
12
 
 
27
  strict: false
28
 
29
  datasets:
30
+ - path: thewimo/german-medical-identification-dataset-v0.1
31
  type: alpaca
32
  dataset_prepared_path: last_run_prepared
33
+ val_set_size: 0.2
34
  output_dir: ./outputs/lora-out
35
+ hub_model_id: thewimo/Mistral-7B-v0.3-deide-phi
36
 
37
  adapter: lora
38
  lora_model_dir:
39
 
40
+ sequence_len: 4096
41
  sample_packing: false
42
  pad_to_sequence_len: true
43
 
44
+ lora_r: 8
45
  lora_alpha: 16
46
  lora_dropout: 0.05
47
  lora_target_linear: true
 
58
  wandb_project: axolotl-runs
59
  wandb_entity: thewind-mom-finetuning
60
  wandb_watch:
61
+ wandb_name: Mistral-7B-v0.3-deide-phi
62
  wandb_log_model:
63
 
64
  gradient_accumulation_steps: 4
65
+ micro_batch_size: 4
66
+ num_epochs: 4
67
+ optimizer: paged_adamw_8bit
68
  lr_scheduler: cosine
69
  learning_rate: 0.0002
70
 
 
101
 
102
  </details><br>
103
 
104
+ # Mistral-7B-v0.3-deide-phi
105
 
106
  This model is a fine-tuned version of [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3) on the None dataset.
107
  It achieves the following results on the evaluation set:
108
+ - Loss: 0.0364
109
 
110
  ## Model description
111
 
 
125
 
126
  The following hyperparameters were used during training:
127
  - learning_rate: 0.0002
128
+ - train_batch_size: 4
129
+ - eval_batch_size: 4
130
  - seed: 42
131
  - distributed_type: multi-GPU
132
  - num_devices: 3
133
  - gradient_accumulation_steps: 4
134
+ - total_train_batch_size: 48
135
+ - total_eval_batch_size: 12
136
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
137
  - lr_scheduler_type: cosine
138
  - lr_scheduler_warmup_steps: 10
139
+ - num_epochs: 4
140
 
141
  ### Training results
142
 
143
  | Training Loss | Epoch | Step | Validation Loss |
144
  |:-------------:|:------:|:----:|:---------------:|
145
+ | 1.9682 | 0.0506 | 1 | 2.0579 |
146
+ | 1.2784 | 0.2532 | 5 | 0.8308 |
147
+ | 0.187 | 0.5063 | 10 | 0.1732 |
148
+ | 0.1094 | 0.7595 | 15 | 0.0819 |
149
+ | 0.0542 | 1.0127 | 20 | 0.0593 |
150
+ | 0.0354 | 1.2658 | 25 | 0.0521 |
151
+ | 0.0493 | 1.5190 | 30 | 0.0457 |
152
+ | 0.038 | 1.7722 | 35 | 0.0432 |
153
+ | 0.0143 | 2.0253 | 40 | 0.0425 |
154
+ | 0.0269 | 2.2785 | 45 | 0.0423 |
155
+ | 0.0273 | 2.5316 | 50 | 0.0415 |
156
+ | 0.0277 | 2.7848 | 55 | 0.0366 |
157
+ | 0.0288 | 3.0380 | 60 | 0.0356 |
158
+ | 0.0241 | 3.2911 | 65 | 0.0358 |
159
+ | 0.0125 | 3.5443 | 70 | 0.0362 |
160
+ | 0.0164 | 3.7975 | 75 | 0.0364 |
161
 
162
 
163
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:91deab91f9cc247d51b6c430aade09543217a6bae87899c2827c525f6683a04a
3
  size 84047370
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e59e28c76ba9ce43e616119ed0176fa02161210dc902ac5f0cc375cba8e2d60b
3
  size 84047370