strickvl commited on
Commit
71437c0
·
verified ·
1 Parent(s): dec12a7

End of training

Browse files
Files changed (2) hide show
  1. README.md +175 -0
  2. adapter_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ base_model: mistralai/Mistral-7B-v0.1
8
+ model-index:
9
+ - name: isafpr-mistral-lora-templatefree
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ base_model: mistralai/Mistral-7B-v0.1
22
+ model_type: MistralForCausalLM
23
+ tokenizer_type: LlamaTokenizer
24
+
25
+ load_in_8bit: false
26
+ load_in_4bit: true
27
+ strict: false
28
+
29
+ data_seed: 42
30
+ seed: 42
31
+
32
+ datasets:
33
+ - path: data/templatefree_isaf_press_releases_ft_train.jsonl
34
+ type: input_output
35
+ dataset_prepared_path:
36
+ val_set_size: 0.1
37
+ output_dir: ./outputs/mistral/lora-out-templatefree
38
+ hub_model_id: strickvl/isafpr-mistral-lora-templatefree
39
+
40
+
41
+ sequence_len: 4096
42
+ sample_packing: true
43
+ pad_to_sequence_len: true
44
+
45
+ adapter: lora
46
+ lora_model_dir:
47
+ lora_r: 32
48
+ lora_alpha: 16
49
+ lora_dropout: 0.05
50
+ lora_target_linear: true
51
+ lora_fan_in_fan_out:
52
+ lora_target_modules:
53
+ - gate_proj
54
+ - down_proj
55
+ - up_proj
56
+ - q_proj
57
+ - v_proj
58
+ - k_proj
59
+ - o_proj
60
+
61
+ wandb_project: isaf_pr_ft
62
+ wandb_entity: strickvl
63
+ wandb_watch:
64
+ wandb_name:
65
+ wandb_log_model:
66
+
67
+ gradient_accumulation_steps: 4
68
+ micro_batch_size: 2
69
+ num_epochs: 4
70
+ optimizer: adamw_bnb_8bit
71
+ lr_scheduler: cosine
72
+ learning_rate: 0.0002
73
+
74
+ train_on_inputs: false
75
+ group_by_length: false
76
+ bf16: auto
77
+ fp16:
78
+ tf32: false
79
+
80
+ gradient_checkpointing: true
81
+ early_stopping_patience:
82
+ resume_from_checkpoint:
83
+ local_rank:
84
+ logging_steps: 1
85
+ xformers_attention:
86
+ flash_attention: true
87
+
88
+ loss_watchdog_threshold: 5.0
89
+ loss_watchdog_patience: 3
90
+
91
+ warmup_steps: 10
92
+ evals_per_epoch: 4
93
+ eval_table_size:
94
+ eval_max_new_tokens: 128
95
+ saves_per_epoch: 1
96
+ debug:
97
+ deepspeed:
98
+ weight_decay: 0.0
99
+ fsdp:
100
+ fsdp_config:
101
+ special_tokens:
102
+ bos_token: "<s>"
103
+ eos_token: "</s>"
104
+
105
+ ```
106
+
107
+ </details><br>
108
+
109
+ # isafpr-mistral-lora-templatefree
110
+
111
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
112
+ It achieves the following results on the evaluation set:
113
+ - Loss: 0.0294
114
+
115
+ ## Model description
116
+
117
+ More information needed
118
+
119
+ ## Intended uses & limitations
120
+
121
+ More information needed
122
+
123
+ ## Training and evaluation data
124
+
125
+ More information needed
126
+
127
+ ## Training procedure
128
+
129
+ ### Training hyperparameters
130
+
131
+ The following hyperparameters were used during training:
132
+ - learning_rate: 0.0002
133
+ - train_batch_size: 2
134
+ - eval_batch_size: 2
135
+ - seed: 42
136
+ - distributed_type: multi-GPU
137
+ - num_devices: 2
138
+ - gradient_accumulation_steps: 4
139
+ - total_train_batch_size: 16
140
+ - total_eval_batch_size: 4
141
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
142
+ - lr_scheduler_type: cosine
143
+ - lr_scheduler_warmup_steps: 10
144
+ - num_epochs: 4
145
+
146
+ ### Training results
147
+
148
+ | Training Loss | Epoch | Step | Validation Loss |
149
+ |:-------------:|:------:|:----:|:---------------:|
150
+ | 1.4053 | 0.0276 | 1 | 1.4080 |
151
+ | 0.1886 | 0.2483 | 9 | 0.1361 |
152
+ | 0.0544 | 0.4966 | 18 | 0.0546 |
153
+ | 0.0524 | 0.7448 | 27 | 0.0445 |
154
+ | 0.0392 | 0.9931 | 36 | 0.0395 |
155
+ | 0.0356 | 1.2138 | 45 | 0.0369 |
156
+ | 0.0396 | 1.4621 | 54 | 0.0350 |
157
+ | 0.0281 | 1.7103 | 63 | 0.0341 |
158
+ | 0.0334 | 1.9586 | 72 | 0.0330 |
159
+ | 0.0257 | 2.1793 | 81 | 0.0316 |
160
+ | 0.0204 | 2.4276 | 90 | 0.0313 |
161
+ | 0.0264 | 2.6759 | 99 | 0.0309 |
162
+ | 0.0239 | 2.9241 | 108 | 0.0298 |
163
+ | 0.022 | 3.1517 | 117 | 0.0298 |
164
+ | 0.0219 | 3.4 | 126 | 0.0296 |
165
+ | 0.0221 | 3.6483 | 135 | 0.0295 |
166
+ | 0.0205 | 3.8966 | 144 | 0.0294 |
167
+
168
+
169
+ ### Framework versions
170
+
171
+ - PEFT 0.11.1
172
+ - Transformers 4.41.1
173
+ - Pytorch 2.3.0+cu121
174
+ - Datasets 2.19.1
175
+ - Tokenizers 0.19.1
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5113af9785be2ce52a1d2ed97d4327e878ed58b91cdaa70ed6a2a73e2a3afce
3
+ size 335706186