trollek commited on
Commit
a40d8a5
1 Parent(s): 5f2e091

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -3
README.md CHANGED
@@ -1,3 +1,151 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Crystalcareai/openhermes_200k_unfiltered
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ base_model: h2oai/h2o-danube2-1.8b-base
9
+ ---
10
+ # h2o-danube2 with ChatML template
11
+
12
+ This is a [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") and [LoRA+](https://arxiv.org/abs/2402.12354 "LoRA+: Efficient Low Rank Adaptation of Large Models") fine-tuned danube2 base model. It uses the ChatML template and was trained on the [openhermes-unfiltered](https://huggingface.co/datasets/Crystalcareai/openhermes_200k_unfiltered).
13
+
14
+
15
+ ## Template
16
+
17
+ ```jinja
18
+ <|im_start>user
19
+ {{instruction}}<|im_end|>
20
+ <|im_start>assistant
21
+ {{response}}<|im_end>
22
+ ```
23
+
24
+ ## BAdam
25
+
26
+ **System:** You are a helpful assistant.
27
+
28
+ ```yaml
29
+ ### model
30
+ model_name_or_path: danube2-base-chatml
31
+
32
+ ### method
33
+ stage: sft
34
+ do_train: true
35
+ finetuning_type: full
36
+ use_badam: true
37
+ badam_switch_mode: ascending
38
+ badam_switch_interval: 50
39
+ badam_verbose: 1
40
+ badam_start_block: 10
41
+ seed: 720
42
+
43
+ ### dataset
44
+ dataset: openhermes_unfiltered
45
+ template: ninja_chatml
46
+ cutoff_len: 8192
47
+ overwrite_cache: false
48
+ preprocessing_num_workers: 12
49
+
50
+ ### output
51
+ output_dir: openhermes-chatml-badam
52
+ logging_steps: 5
53
+ save_steps: 1
54
+ save_strategy: epoch
55
+ plot_loss: true
56
+ overwrite_output_dir: false
57
+
58
+ ### train
59
+ per_device_train_batch_size: 2
60
+ gradient_accumulation_steps: 8
61
+ learning_rate: 0.00001
62
+ num_train_epochs: 1
63
+ lr_scheduler_type: constant_with_warmup
64
+ warmup_ratio: 0.01
65
+ bf16: true
66
+ flash_attn: fa2
67
+
68
+ ### eval
69
+ val_size: 0.01
70
+ per_device_eval_batch_size: 1
71
+ eval_strategy: steps
72
+ eval_steps: 2000
73
+ ```
74
+
75
+ ### BAdam Training results
76
+
77
+ | Training Loss | Epoch | Step | Validation Loss |
78
+ |:-------------:|:------:|:-----:|:---------------:|
79
+ | 0.7971 | 0.1748 | 2000 | 0.7418 |
80
+ | 0.6815 | 0.3496 | 4000 | 0.7178 |
81
+ | 0.6593 | 0.5245 | 6000 | 0.7055 |
82
+ | 0.6923 | 0.6993 | 8000 | 0.6960 |
83
+ | 0.6942 | 0.8741 | 10000 | 0.6877 |
84
+
85
+
86
+ ## QLoRA+
87
+ ```yaml
88
+ ### model
89
+ model_name_or_path: openhermes-chatml-badam
90
+
91
+ ### method
92
+ stage: sft
93
+ do_train: true
94
+ finetuning_type: lora
95
+ lora_target: all
96
+ loraplus_lr_ratio: 16.0
97
+ lora_rank: 8
98
+ lora_alpha: 16
99
+ use_unsloth: true
100
+ quantization_bit: 4
101
+ upcast_layernorm: true
102
+ seed: 3141
103
+
104
+ ### dataset
105
+ dataset: openhermes_unfiltered
106
+ template: hermes_chatml
107
+ cutoff_len: 8192
108
+ overwrite_cache: false
109
+ preprocessing_num_workers: 12
110
+
111
+ ### output
112
+ output_dir: openhermes-chatml-badam/loraplus
113
+ logging_steps: 1
114
+ save_steps: 1
115
+ save_strategy: epoch
116
+ plot_loss: true
117
+ overwrite_output_dir: false
118
+
119
+ ### train
120
+ per_device_train_batch_size: 4
121
+ gradient_accumulation_steps: 4
122
+ learning_rate: 0.0001
123
+ num_train_epochs: 1.0
124
+ lr_scheduler_type: cosine
125
+ warmup_ratio: 0.01
126
+ bf16: true
127
+ flash_attn: fa2
128
+ #neftune_noise_alpha: 5
129
+
130
+ ### eval
131
+ val_size: 0.02
132
+ per_device_eval_batch_size: 1
133
+ eval_strategy: steps
134
+ eval_steps: 1000
135
+ ```
136
+
137
+ ### QLoRA+ Training results
138
+
139
+ | Training Loss | Epoch | Step | Validation Loss |
140
+ |:-------------:|:------:|:-----:|:---------------:|
141
+ | 0.6523 | 0.0883 | 1000 | 0.7126 |
142
+ | 0.6398 | 0.1766 | 2000 | 0.7086 |
143
+ | 0.6865 | 0.2649 | 3000 | 0.7001 |
144
+ | 0.6714 | 0.3532 | 4000 | 0.6917 |
145
+ | 0.7213 | 0.4415 | 5000 | 0.6819 |
146
+ | 0.7764 | 0.5298 | 6000 | 0.6721 |
147
+ | 0.6931 | 0.6181 | 7000 | 0.6638 |
148
+ | 0.6632 | 0.7064 | 8000 | 0.6560 |
149
+ | 0.5966 | 0.7947 | 9000 | 0.6514 |
150
+ | 0.6339 | 0.8830 | 10000 | 0.6482 |
151
+ | 0.4987 | 0.9713 | 11000 | 0.6472 |