RichardErkhov commited on
Commit
3be9734
1 Parent(s): b7f61f0

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +149 -0
README.md ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ Llama-3-13B-Instruct-ft - GGUF
11
+ - Model creator: https://huggingface.co/elinas/
12
+ - Original model: https://huggingface.co/elinas/Llama-3-13B-Instruct-ft/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [Llama-3-13B-Instruct-ft.Q2_K.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q2_K.gguf) | Q2_K | 4.68GB |
18
+ | [Llama-3-13B-Instruct-ft.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.IQ3_XS.gguf) | IQ3_XS | 5.18GB |
19
+ | [Llama-3-13B-Instruct-ft.IQ3_S.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.IQ3_S.gguf) | IQ3_S | 5.45GB |
20
+ | [Llama-3-13B-Instruct-ft.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q3_K_S.gguf) | Q3_K_S | 5.42GB |
21
+ | [Llama-3-13B-Instruct-ft.IQ3_M.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.IQ3_M.gguf) | IQ3_M | 5.61GB |
22
+ | [Llama-3-13B-Instruct-ft.Q3_K.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q3_K.gguf) | Q3_K | 5.98GB |
23
+ | [Llama-3-13B-Instruct-ft.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q3_K_M.gguf) | Q3_K_M | 5.98GB |
24
+ | [Llama-3-13B-Instruct-ft.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q3_K_L.gguf) | Q3_K_L | 6.47GB |
25
+ | [Llama-3-13B-Instruct-ft.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.IQ4_XS.gguf) | IQ4_XS | 6.69GB |
26
+ | [Llama-3-13B-Instruct-ft.Q4_0.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q4_0.gguf) | Q4_0 | 6.97GB |
27
+ | [Llama-3-13B-Instruct-ft.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.IQ4_NL.gguf) | IQ4_NL | 7.04GB |
28
+ | [Llama-3-13B-Instruct-ft.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q4_K_S.gguf) | Q4_K_S | 7.01GB |
29
+ | [Llama-3-13B-Instruct-ft.Q4_K.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q4_K.gguf) | Q4_K | 7.38GB |
30
+ | [Llama-3-13B-Instruct-ft.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q4_K_M.gguf) | Q4_K_M | 7.38GB |
31
+ | [Llama-3-13B-Instruct-ft.Q4_1.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q4_1.gguf) | Q4_1 | 7.7GB |
32
+ | [Llama-3-13B-Instruct-ft.Q5_0.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q5_0.gguf) | Q5_0 | 8.43GB |
33
+ | [Llama-3-13B-Instruct-ft.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q5_K_S.gguf) | Q5_K_S | 8.43GB |
34
+ | [Llama-3-13B-Instruct-ft.Q5_K.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q5_K.gguf) | Q5_K | 8.64GB |
35
+ | [Llama-3-13B-Instruct-ft.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q5_K_M.gguf) | Q5_K_M | 8.64GB |
36
+ | [Llama-3-13B-Instruct-ft.Q5_1.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q5_1.gguf) | Q5_1 | 9.16GB |
37
+ | [Llama-3-13B-Instruct-ft.Q6_K.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q6_K.gguf) | Q6_K | 9.98GB |
38
+ | [Llama-3-13B-Instruct-ft.Q8_0.gguf](https://huggingface.co/RichardErkhov/elinas_-_Llama-3-13B-Instruct-ft-gguf/blob/main/Llama-3-13B-Instruct-ft.Q8_0.gguf) | Q8_0 | 12.92GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ base_model:
46
+ - elinas/Llama-3-13B-Instruct
47
+ library_name: transformers
48
+ tags:
49
+ - mergekit
50
+ - merge
51
+ datasets:
52
+ - Chat-Error/Pure-dove-sharegpt
53
+ license: llama3
54
+ ---
55
+ # Llama-3-13B-Instruct-ft
56
+
57
+ This is a QLoRA **finetune** of a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
58
+
59
+ The model is based on my passthrough merge of [Llama-3-13B-Instruct](https://huggingface.co/elinas/Llama-3-13B-Instruct)
60
+
61
+ This was primarily an experiment to see how a passthrough merge will respond to further finetuning, though this was done on a small dataset.
62
+
63
+ The goal was to make a "mid" sized model like Meta has released in the past and the merge method was inspired by [mlabonne's Llama-3-120B](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct).
64
+
65
+ The model was finetuned on **8192 context length** and is likely reliable using RoPE up to 32k.
66
+
67
+ It still cannot do math reliably; neither can Llama-3-8B, and in my tests only Llama-3-70B passes basic arithmetic, but it is a better storywriter/RP than Llama-3-8B from some side by side testing I conducted.
68
+
69
+ Further finetuning this model or finetuning the [base model](https://huggingface.co/elinas/Llama-3-13B-Instruct) on more samples is encouraged.
70
+
71
+ ## Datasets
72
+
73
+ * [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
74
+
75
+ A small dataset was used to see how it affects performance. Originally I planned to do a larger dataset (196k samples), but wanted to start with a smaller one first to see how much the model improved with some additional finetuning.
76
+
77
+ Next steps would be finetuning on a larger dataset if through further testing, performance improvements are noticed.
78
+
79
+ ## Finetuning details
80
+ This is a QLoRA model and all modules were targeted.
81
+ ```yaml
82
+ lora_target_modules:
83
+ - gate_proj
84
+ - down_proj
85
+ - up_proj
86
+ - q_proj
87
+ - v_proj
88
+ - k_proj
89
+ - o_proj
90
+ lora_modules_to_save:
91
+ - embed_tokens
92
+ - lm_head
93
+ ```
94
+
95
+ ```yaml
96
+ The following hyperparameters were used during training:
97
+ - learning_rate: 1e-05
98
+ - train_batch_size: 1
99
+ - eval_batch_size: 1
100
+ - seed: 42
101
+ - distributed_type: multi-GPU
102
+ - num_devices: 3
103
+ - total_train_batch_size: 3
104
+ - total_eval_batch_size: 3
105
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
106
+ - lr_scheduler_type: cosine
107
+ - lr_scheduler_warmup_steps: 25
108
+ - num_epochs: 1
109
+ ```
110
+
111
+ Optimizer `paged_adamw_8bit` and Deepspeed ZeRO 3 was used at a LR of `1e-5` using the cosine scheduler for 1 epoch on 3x3090s taking 4h 12m 13s total.
112
+
113
+ Sample packing and padding was disabled to reduce VRAM consumption significantly at the cost of speed.
114
+
115
+ W&B Run Summary
116
+ ```
117
+ wandb: Run summary:
118
+ wandb: eval/loss 1.00774
119
+ wandb: eval/runtime 535.3847
120
+ wandb: eval/samples_per_second 0.721
121
+ wandb: eval/steps_per_second 0.241
122
+ wandb: total_flos 4167452590080.0
123
+ wandb: train/epoch 1.0
124
+ wandb: train/global_step 1157
125
+ wandb: train/grad_norm 4.50846
126
+ wandb: train/learning_rate 0.0
127
+ wandb: train/loss 1.4115
128
+ wandb: train_loss 1.00352
129
+ wandb: train_runtime 14921.1227
130
+ wandb: train_samples_per_second 0.233
131
+ wandb: train_steps_per_second 0.078
132
+ ```
133
+
134
+ ### Framework versions
135
+
136
+ - PEFT 0.10.0
137
+ - Transformers 4.40.0.dev0
138
+ - Pytorch 2.3.0+cu121
139
+ - Datasets 2.15.0
140
+ - Tokenizers 0.15.0
141
+
142
+ ## Model Evaluation
143
+
144
+ TBD - submitted
145
+
146
+ If you have any questions or comments on the model, feel free to open a discussion in the community tab.
147
+
148
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
149
+