trollek commited on
Commit
49572ca
·
verified ·
1 Parent(s): a938d8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -3
README.md CHANGED
@@ -1,3 +1,137 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - abacusai/SystemChat-1.1
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ tags:
9
+ - llama-factory
10
+ - unsloth
11
+ ---
12
+ # h2o-danube2 with ChatML template
13
+
14
+ This is a [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") and [LoRA+](https://arxiv.org/abs/2402.12354 "LoRA+: Efficient Low Rank Adaptation of Large Models") fine-tuned danube2 base model. It uses the ChatML template and was trained on the [SystemChat-1.1](https://huggingface.co/datasets/abacusai/SystemChat-1.1) from [Abacus.AI](https://huggingface.co/abacusai).
15
+
16
+
17
+
18
+ ## BAdam
19
+
20
+ ```yaml
21
+ ### model
22
+ model_name_or_path: danube2-base-chatml
23
+
24
+ ### method
25
+ stage: sft
26
+ do_train: true
27
+ finetuning_type: full
28
+ use_badam: true
29
+ badam_switch_mode: descending
30
+ badam_switch_interval: 50
31
+ badam_start_block: 22
32
+ badam_mask_mode: scatter
33
+ badam_verbose: 1
34
+ seed: 314
35
+
36
+ ### dataset
37
+ dataset: systemchat11
38
+ template: ninja_chatml
39
+ cutoff_len: 8192
40
+ overwrite_cache: false
41
+ preprocessing_num_workers: 12
42
+
43
+ ### output
44
+ output_dir: systemchat11-chatml-badam
45
+ logging_steps: 5
46
+ save_steps: 1
47
+ save_strategy: epoch
48
+ plot_loss: true
49
+ overwrite_output_dir: false
50
+
51
+ ### train
52
+ per_device_train_batch_size: 2
53
+ gradient_accumulation_steps: 8
54
+ learning_rate: 0.00002
55
+ num_train_epochs: 3
56
+ lr_scheduler_type: cosine
57
+ warmup_ratio: 0.01
58
+ bf16: true
59
+ flash_attn: fa2
60
+
61
+ ### eval
62
+ val_size: 0.01
63
+ per_device_eval_batch_size: 1
64
+ eval_strategy: steps
65
+ eval_steps: 1000
66
+
67
+ ```
68
+
69
+ ### BAdam Training results
70
+
71
+ | Training Loss | Epoch | Step | Validation Loss |
72
+ |:-------------:|:------:|:----:|:---------------:|
73
+ | 1.0062 | 0.8324 | 1000 | 0.9837 |
74
+ | 0.8484 | 1.6648 | 2000 | 0.9388 |
75
+ | 0.7834 | 2.4971 | 3000 | 0.9309 |
76
+
77
+
78
+ ## QLoRA+
79
+
80
+ ```yaml
81
+ ### model
82
+ model_name_or_path: systemchat11-chatml-badam
83
+
84
+ ### method
85
+ stage: sft
86
+ do_train: true
87
+ finetuning_type: lora
88
+ lora_target: all
89
+ loraplus_lr_ratio: 16.0
90
+ lora_rank: 8
91
+ lora_alpha: 16
92
+ use_unsloth: true
93
+ quantization_bit: 4
94
+ upcast_layernorm: true
95
+ seed: 31415
96
+
97
+ ### dataset
98
+ dataset: systemchat11
99
+ template: hermes_chatml
100
+ cutoff_len: 8192
101
+ overwrite_cache: false
102
+ preprocessing_num_workers: 12
103
+
104
+ ### output
105
+ output_dir: systemchat11-chatml-badam/loraplus
106
+ logging_steps: 1
107
+ save_steps: 1
108
+ save_strategy: epoch
109
+ plot_loss: true
110
+ overwrite_output_dir: false
111
+
112
+ ### train
113
+ per_device_train_batch_size: 4
114
+ gradient_accumulation_steps: 4
115
+ learning_rate: 0.0001
116
+ num_train_epochs: 2.0
117
+ lr_scheduler_type: cosine
118
+ warmup_ratio: 0.01
119
+ bf16: true
120
+ flash_attn: fa2
121
+
122
+ ### eval
123
+ val_size: 0.02
124
+ per_device_eval_batch_size: 1
125
+ eval_strategy: steps
126
+ eval_steps: 500
127
+ ```
128
+
129
+ ### QLoRA+ Training results
130
+
131
+ | Training Loss | Epoch | Step | Validation Loss |
132
+ |:-------------:|:------:|:----:|:---------------:|
133
+ | 0.8591 | 0.4204 | 500 | 0.8457 |
134
+ | 0.9098 | 0.8409 | 1000 | 0.8251 |
135
+ | 0.735 | 1.2613 | 1500 | 0.8304 |
136
+ | 0.6811 | 1.6817 | 2000 | 0.8252 |
137
+