hamel commited on
Commit
3d9bdac
1 Parent(s): 4fb9d48

End of training

Browse files
Files changed (2) hide show
  1. README.md +177 -1
  2. adapter_model.bin +3 -0
README.md CHANGED
@@ -1,3 +1,179 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: peft
3
+ tags:
4
+ - axolotl
5
+ - generated_from_trainer
6
+ base_model: NousResearch/Llama-2-7b-hf
7
+ model-index:
8
+ - name: tokenfight
9
+ results: []
10
  ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.3.0`
19
+ ```yaml
20
+ base_model: NousResearch/Llama-2-7b-hf
21
+ model_type: LlamaForCausalLM
22
+ tokenizer_type: LlamaTokenizer
23
+ is_llama_derived_model: true
24
+
25
+ load_in_8bit: false
26
+ load_in_4bit: true
27
+ strict: false
28
+
29
+ datasets:
30
+ - path: mhenrichsen/alpaca_2k_test
31
+ type: alpaca
32
+ dataset_prepared_path:
33
+ val_set_size: 0.05
34
+ output_dir: ./qlora-out
35
+
36
+ adapter: qlora
37
+ lora_model_dir:
38
+
39
+ sequence_len: 4096
40
+ sample_packing: false
41
+ pad_to_sequence_len: true
42
+
43
+ lora_r: 32
44
+ lora_alpha: 16
45
+ lora_dropout: 0.05
46
+ lora_target_modules:
47
+ lora_target_linear: true
48
+ lora_fan_in_fan_out:
49
+
50
+ wandb_project:
51
+ wandb_entity:
52
+ wandb_watch:
53
+ wandb_name:
54
+ wandb_log_model:
55
+
56
+ gradient_accumulation_steps: 4
57
+ micro_batch_size: 2
58
+ num_epochs: 4
59
+ optimizer: paged_adamw_32bit
60
+ lr_scheduler: cosine
61
+ learning_rate: 0.0002
62
+
63
+ train_on_inputs: false
64
+ group_by_length: false
65
+ bf16: true
66
+ fp16: false
67
+ tf32: false
68
+
69
+ gradient_checkpointing: true
70
+ early_stopping_patience:
71
+ resume_from_checkpoint:
72
+ local_rank:
73
+ logging_steps: 1
74
+ xformers_attention:
75
+ flash_attention: true
76
+
77
+ warmup_steps: 10
78
+ evals_per_epoch: 4
79
+ eval_table_size:
80
+ saves_per_epoch: 1
81
+ debug:
82
+ deepspeed:
83
+ weight_decay: 0.0
84
+ fsdp:
85
+ fsdp_config:
86
+ special_tokens:
87
+ bos_token: "<s>"
88
+ eos_token: "</s>"
89
+ unk_token: "<unk>"
90
+
91
+ hub_model_id: "hamel/tokenfight"
92
+ ```
93
+
94
+ </details><br>
95
+
96
+ # tokenfight
97
+
98
+ This model is a fine-tuned version of [NousResearch/Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) on the None dataset.
99
+ It achieves the following results on the evaluation set:
100
+ - Loss: 1.0035
101
+
102
+ ## Model description
103
+
104
+ More information needed
105
+
106
+ ## Intended uses & limitations
107
+
108
+ More information needed
109
+
110
+ ## Training and evaluation data
111
+
112
+ More information needed
113
+
114
+ ## Training procedure
115
+
116
+ ### Training hyperparameters
117
+
118
+ The following hyperparameters were used during training:
119
+ - learning_rate: 0.0002
120
+ - train_batch_size: 2
121
+ - eval_batch_size: 2
122
+ - seed: 42
123
+ - distributed_type: multi-GPU
124
+ - num_devices: 3
125
+ - gradient_accumulation_steps: 4
126
+ - total_train_batch_size: 24
127
+ - total_eval_batch_size: 6
128
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
129
+ - lr_scheduler_type: cosine
130
+ - lr_scheduler_warmup_steps: 10
131
+ - num_epochs: 4
132
+
133
+ ### Training results
134
+
135
+ | Training Loss | Epoch | Step | Validation Loss |
136
+ |:-------------:|:-----:|:----:|:---------------:|
137
+ | 1.1753 | 0.01 | 1 | 1.1604 |
138
+ | 0.9235 | 0.25 | 20 | 0.9296 |
139
+ | 1.1097 | 0.5 | 40 | 0.9156 |
140
+ | 0.9275 | 0.76 | 60 | 0.9006 |
141
+ | 1.0284 | 1.01 | 80 | 0.8942 |
142
+ | 0.8905 | 1.26 | 100 | 0.8930 |
143
+ | 0.8952 | 1.51 | 120 | 0.9071 |
144
+ | 0.8816 | 1.77 | 140 | 0.9189 |
145
+ | 0.7187 | 2.02 | 160 | 0.9026 |
146
+ | 0.5115 | 2.27 | 180 | 0.9251 |
147
+ | 0.6322 | 2.52 | 200 | 0.9525 |
148
+ | 0.7149 | 2.78 | 220 | 0.9638 |
149
+ | 0.5881 | 3.03 | 240 | 0.9699 |
150
+ | 0.5596 | 3.28 | 260 | 0.9750 |
151
+ | 0.4989 | 3.53 | 280 | 1.0047 |
152
+ | 0.3654 | 3.79 | 300 | 1.0035 |
153
+
154
+
155
+ ### Framework versions
156
+
157
+ - Transformers 4.37.0.dev0
158
+ - Pytorch 2.1.0
159
+ - Datasets 2.15.0
160
+ - Tokenizers 0.15.0
161
+ ## Training procedure
162
+
163
+
164
+ The following `bitsandbytes` quantization config was used during training:
165
+ - quant_method: bitsandbytes
166
+ - load_in_8bit: False
167
+ - load_in_4bit: True
168
+ - llm_int8_threshold: 6.0
169
+ - llm_int8_skip_modules: None
170
+ - llm_int8_enable_fp32_cpu_offload: False
171
+ - llm_int8_has_fp16_weight: False
172
+ - bnb_4bit_quant_type: nf4
173
+ - bnb_4bit_use_double_quant: True
174
+ - bnb_4bit_compute_dtype: bfloat16
175
+
176
+ ### Framework versions
177
+
178
+
179
+ - PEFT 0.6.0
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0926274afd9d965bb8627c2f065efa7db87dac726219da08d0b41da056cb1182
3
+ size 319977674