abhinand commited on
Commit
f91c612
1 Parent(s): 8e89c74

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -0
README.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - teknium/OpenHermes-2.5
5
+ - abhinand/ultrachat_200k_sharegpt
6
+ language:
7
+ - en
8
+ ---
9
+
10
+ # TinyLLaMA OpenHermes2.5 [Work in Progress]
11
+
12
+ This a finetune of TinyLLaMA base model finetuned on [OpenHermes 2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) and [UltraChat 200k](https://huggingface.co/datasets/abhinand/ultrachat_200k_sharegpt) for a single epoch.
13
+
14
+ Training was generously supported by [Jarvislabs.ai](https://jarvislabs.ai/).
15
+
16
+ If you appreciate this work and would like to support its continued development, consider [buying me a coffee](https://www.buymeacoffee.com/abhinand.b). Your support is invaluable and greatly appreciated.
17
+
18
+ [!["Buy Me A Coffee"](https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png)](https://www.buymeacoffee.com/abhinand.b)
19
+
20
+ <details><summary>See axolotl config</summary>
21
+
22
+ axolotl version: `0.4.0`
23
+ ```yaml
24
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
25
+ model_type: AutoModelForCausalLM
26
+ tokenizer_type: AutoTokenizer
27
+ trust_remote_code: true
28
+ is_llama_derived_model: true
29
+
30
+ # huggingface repo
31
+ datasets:
32
+ - path: teknium/OpenHermes-2.5
33
+ type: sharegpt
34
+ conversation: chatml
35
+ train_on_split: train
36
+
37
+ - path: abhinand/ultrachat_200k_sharegpt
38
+ type: sharegpt
39
+ conversation: chatml
40
+ train_on_split: train
41
+
42
+ load_in_4bit: false
43
+ load_in_8bit: false
44
+ bf16: true # require >=ampere
45
+ chat_template: chatml
46
+
47
+ dataset_prepared_path: last_run_prepared_path
48
+ hub_model_id: abhinand/TinyLlama-1.1B-OpenHermes-2.5-Chat-v1.0
49
+ group_by_length: false
50
+
51
+ val_set_size: 0.0
52
+ sequence_len: 2048
53
+ sample_packing: true
54
+ pad_to_sequence_len: true
55
+
56
+ adapter: lora
57
+ lora_model_dir:
58
+ lora_r: 32
59
+ lora_alpha: 16
60
+ lora_target_modules:
61
+ - q_proj
62
+ - v_proj
63
+ - k_proj
64
+ - o_proj
65
+ - gate_proj
66
+ - down_proj
67
+ - up_proj
68
+ lora_modules_to_save:
69
+ - embed_tokens
70
+ - lm_head
71
+ lora_dropout: 0.05
72
+ lora_target_linear: true
73
+ lora_fan_in_fan_out:
74
+
75
+ output_dir: /home/tiny-llama/trained_models
76
+
77
+ gradient_accumulation_steps: 2
78
+ micro_batch_size: 32
79
+ eval_batch_size: 32
80
+ num_epochs: 1
81
+ logging_steps: 1
82
+ save_steps: 50
83
+ save_total_limit: 3
84
+
85
+ save_safetensors: true
86
+ gradient_checkpointing: true
87
+
88
+ lr_scheduler: cosine
89
+ optimizer: "adamw_bnb_8bit"
90
+ adam_beta2: 0.95
91
+ adam_epsilon: 0.00001
92
+ weight_decay: 0.1
93
+ learning_rate: 0.0005
94
+ max_grad_norm: 1.0
95
+ warmup_ratio: 0.05
96
+ # warmup_steps: 100
97
+
98
+ flash_attention: true
99
+
100
+ # Resume from a specific checkpoint dir
101
+ resume_from_checkpoint:
102
+ # If resume_from_checkpoint isn't set and you simply want it to start where it left off.
103
+ # Be careful with this being turned on between different models.
104
+ # auto_resume_from_checkpoints: true
105
+
106
+ # wandb configuration if you're using it
107
+ # Make sure your `WANDB_API_KEY` environment variable is set (recommended) or you login to wandb with `wandb login`.
108
+ wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb
109
+ wandb_project: "tiny-llama-sft"
110
+ wandb_name:
111
+ wandb_run_id:
112
+
113
+ special_tokens:
114
+ bos_token: "<s>"
115
+ eos_token: "</s>"
116
+ unk_token: "<unk>"
117
+ tokens: # these are delimiters
118
+ - "<|im_start|>"
119
+ - "<|im_end|>"
120
+
121
+ ```
122
+
123
+ </details>
124
+
125
+ ## Training procedure
126
+
127
+ ### Training hyperparameters
128
+
129
+ The following hyperparameters were used during training:
130
+ - learning_rate: 0.0005
131
+ - train_batch_size: 32
132
+ - eval_batch_size: 32
133
+ - seed: 42
134
+ - gradient_accumulation_steps: 2
135
+ - total_train_batch_size: 64
136
+ - optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
137
+ - lr_scheduler_type: cosine
138
+ - lr_scheduler_warmup_steps: 476
139
+ - num_epochs: 1
140
+
141
+ ### Framework versions
142
+
143
+ - PEFT 0.8.2
144
+ - Transformers 4.38.0.dev0
145
+ - Pytorch 2.0.1
146
+ - Datasets 2.16.1
147
+ - Tokenizers 0.15.0