elinas commited on
Commit
df36cfa
1 Parent(s): dc0f3b4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -0
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - elinas/Llama-3-15B-Instruct-zeroed
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+ - finetune
9
+ datasets:
10
+ - Chat-Error/Pure-dove-sharegpt
11
+ license: llama3
12
+ ---
13
+ # Llama-3-15B-Instruct-zeroed-ft-v2
14
+
15
+ This is a QLoRA **finetune** of a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
16
+
17
+ The model is based on a "zeroed" passthrough merge of [Llama-3-15B-Instruct-zeroed](https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed)
18
+
19
+ This was primarily an experiment to see how a passthrough merge will respond to further finetuning of all LoRA modules.
20
+
21
+ The model was finetuned on **8192 context length** and is likely reliable using RoPE up to 32k.
22
+
23
+ Further finetuning this model or finetuning the [base model](https://huggingface.co/elinas/Llama-3-15B-Instruct-zeroed) on more samples is encouraged.
24
+
25
+ **This will be conducted by myself on the 3rd iteration of this model. Until I receive sufficient feedback on comparison between 8B, this finetune will be on hold.**
26
+
27
+ ## Datasets
28
+
29
+ * [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
30
+
31
+ A small, high quality, curated dataset was used as a PoC / validation on stabilizing the model after the original passthrough merge.
32
+
33
+ ## Finetuning details
34
+ This is a QLoRA model and all of the LoRA modules were targeted this time to ensure sufficient training before moving on to larger datasets.
35
+ the first version of this model only targeted **o_proj** and **up_proj**
36
+ ```yaml
37
+ lora_target_modules:
38
+ - gate_proj
39
+ - down_proj
40
+ - up_proj
41
+ - q_proj
42
+ - v_proj
43
+ - k_proj
44
+ - o_proj
45
+ lora_modules_to_save:
46
+ - embed_tokens
47
+ - lm_head
48
+ ```
49
+
50
+ The model is coherent even with training the "zeroed" layers plus the additional layers, as this was the recommendation from [Charles Goddard](https://huggingface.co/chargoddard) (mergekit developer) - thank you for sharing the method of merging as well as Toasty
51
+ Pigeon for bringing it to my attention!
52
+
53
+ ```yaml
54
+ The following hyperparameters were used during training:
55
+ - learning_rate: 1e-05
56
+ - train_batch_size: 1
57
+ - eval_batch_size: 1
58
+ - seed: 42
59
+ - distributed_type: multi-GPU
60
+ - num_devices: 3
61
+ - total_train_batch_size: 3
62
+ - total_eval_batch_size: 3
63
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
64
+ - lr_scheduler_type: cosine
65
+ - lr_scheduler_warmup_steps: 25
66
+ - num_epochs: 1
67
+ ```
68
+
69
+ Optimizer `paged_adamw_8bit` and Deepspeed ZeRO 3 was used at a LR of `1e-5` using the cosine scheduler for 1 epoch on 3x3090s taking 4 hours total.
70
+
71
+ **Unsloth** was used for speed and memory savings.
72
+
73
+ Sample packing and padding was disabled to reduce VRAM consumption significantly at the cost of speed.
74
+
75
+ W&B Run Summary
76
+ ```
77
+ wandb: eval/loss 0.90895
78
+ wandb: eval/runtime 463.4688
79
+ wandb: eval/samples_per_second 0.833
80
+ wandb: eval/steps_per_second 0.278
81
+ wandb: total_flos 8270790524928.0
82
+ wandb: train/epoch 1.0
83
+ wandb: train/global_step 1157
84
+ wandb: train/grad_norm 7.3847
85
+ wandb: train/learning_rate 0.0
86
+ wandb: train/loss 0.8702
87
+ wandb: train_loss 0.87814
88
+ wandb: train_runtime 16425.2713
89
+ wandb: train_samples_per_second 0.211
90
+ wandb: train_steps_per_second 0.07
91
+ ```
92
+
93
+ ### Framework versions
94
+
95
+ - PEFT 0.10.0
96
+ - Transformers 4.40.2
97
+ - Pytorch 2.3.0+cu121
98
+ - Datasets 2.19.1
99
+ - Tokenizers 0.19.1
100
+
101
+ ## Model Evaluation
102
+
103
+ TBD
104
+
105
+ If you have any questions or comments on the model, feel free to open a discussion in the community tab.
106
+
107
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)