strickvl commited on
Commit
18ae98a
1 Parent(s): 85c3899

End of training

Browse files
Files changed (2) hide show
  1. README.md +162 -0
  2. adapter_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ library_name: peft
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ base_model: google/gemma-2b
8
+ model-index:
9
+ - name: isafpr-gemma-qlora-templatefree
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ # use google/gemma-7b if you have access
22
+ base_model: google/gemma-2b
23
+ model_type: AutoModelForCausalLM
24
+ tokenizer_type: AutoTokenizer
25
+
26
+ load_in_8bit: false
27
+ load_in_4bit: true
28
+ strict: false
29
+
30
+ datasets:
31
+ - path: data/templatefree_isaf_press_releases_ft_train.jsonl
32
+ type: input_output
33
+ val_set_size: 0.1
34
+ output_dir: ./outputs/gemma/qlora-out-templatefree
35
+ hub_model_id: strickvl/isafpr-gemma-qlora-templatefree
36
+
37
+ adapter: qlora
38
+ lora_r: 32
39
+ lora_alpha: 16
40
+ lora_dropout: 0.05
41
+ lora_target_linear: true
42
+ lora_modules_to_save:
43
+ - embed_tokens
44
+ - lm_head
45
+
46
+ sequence_len: 1024
47
+ sample_packing: true
48
+ eval_sample_packing: false
49
+ pad_to_sequence_len: true
50
+
51
+ wandb_project: isaf_pr_ft
52
+ wandb_entity: strickvl
53
+ wandb_watch:
54
+ wandb_name:
55
+ wandb_log_model:
56
+
57
+
58
+ gradient_accumulation_steps: 3
59
+ micro_batch_size: 2
60
+ num_epochs: 4
61
+ optimizer: adamw_bnb_8bit
62
+ lr_scheduler: cosine
63
+ learning_rate: 0.0002
64
+
65
+ train_on_inputs: false
66
+ group_by_length: false
67
+ bf16: auto
68
+ fp16:
69
+ tf32: false
70
+
71
+ gradient_checkpointing: true
72
+ early_stopping_patience:
73
+ resume_from_checkpoint:
74
+ local_rank:
75
+ logging_steps: 1
76
+ xformers_attention:
77
+ flash_attention: true
78
+
79
+ warmup_ratio: 0.1
80
+ evals_per_epoch: 4
81
+ eval_table_size:
82
+ eval_max_new_tokens: 128
83
+ saves_per_epoch: 1
84
+ debug:
85
+ deepspeed:
86
+ weight_decay: 0.0
87
+ fsdp:
88
+ fsdp_config:
89
+ special_tokens:
90
+ bos_token: "<s>"
91
+ eos_token: "</s>"
92
+
93
+ ```
94
+
95
+ </details><br>
96
+
97
+ # isafpr-gemma-qlora-templatefree
98
+
99
+ This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the None dataset.
100
+ It achieves the following results on the evaluation set:
101
+ - Loss: 0.0379
102
+
103
+ ## Model description
104
+
105
+ More information needed
106
+
107
+ ## Intended uses & limitations
108
+
109
+ More information needed
110
+
111
+ ## Training and evaluation data
112
+
113
+ More information needed
114
+
115
+ ## Training procedure
116
+
117
+ ### Training hyperparameters
118
+
119
+ The following hyperparameters were used during training:
120
+ - learning_rate: 0.0002
121
+ - train_batch_size: 2
122
+ - eval_batch_size: 2
123
+ - seed: 42
124
+ - distributed_type: multi-GPU
125
+ - num_devices: 2
126
+ - gradient_accumulation_steps: 3
127
+ - total_train_batch_size: 12
128
+ - total_eval_batch_size: 4
129
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
130
+ - lr_scheduler_type: cosine
131
+ - lr_scheduler_warmup_steps: 64
132
+ - num_epochs: 4
133
+
134
+ ### Training results
135
+
136
+ | Training Loss | Epoch | Step | Validation Loss |
137
+ |:-------------:|:------:|:----:|:---------------:|
138
+ | 2.3995 | 0.0054 | 1 | 2.3804 |
139
+ | 0.1051 | 0.2527 | 47 | 0.0906 |
140
+ | 0.0444 | 0.5054 | 94 | 0.0617 |
141
+ | 0.0292 | 0.7581 | 141 | 0.0490 |
142
+ | 0.1049 | 1.0108 | 188 | 0.0475 |
143
+ | 0.03 | 1.2419 | 235 | 0.0435 |
144
+ | 0.0219 | 1.4946 | 282 | 0.0411 |
145
+ | 0.0286 | 1.7473 | 329 | 0.0413 |
146
+ | 0.0403 | 2.0 | 376 | 0.0383 |
147
+ | 0.0274 | 2.2330 | 423 | 0.0386 |
148
+ | 0.0178 | 2.4857 | 470 | 0.0384 |
149
+ | 0.0272 | 2.7384 | 517 | 0.0378 |
150
+ | 0.0409 | 2.9910 | 564 | 0.0371 |
151
+ | 0.013 | 3.2240 | 611 | 0.0378 |
152
+ | 0.0177 | 3.4767 | 658 | 0.0380 |
153
+ | 0.018 | 3.7294 | 705 | 0.0379 |
154
+
155
+
156
+ ### Framework versions
157
+
158
+ - PEFT 0.11.1
159
+ - Transformers 4.41.1
160
+ - Pytorch 2.3.0+cu121
161
+ - Datasets 2.19.1
162
+ - Tokenizers 0.19.1
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87f4b7d2d5bee43256a1cff3e53dce3dbb494fb668ba455fc3a7823382e8d74f
3
+ size 2175690242