habanoz commited on
Commit
152436a
1 Parent(s): 668aae2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - databricks/databricks-dolly-15k
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T
9
+ ---
10
+
11
+ TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T finetuned using dolly dataset.
12
+
13
+ Training took 1 hour on an 'ml.g5.xlarge' instance.
14
+
15
+
16
+ ```python
17
+ hyperparameters ={
18
+ 'num_train_epochs': 3, # number of training epochs
19
+ 'per_device_train_batch_size': 6, # batch size for training
20
+ 'gradient_accumulation_steps': 2, # Number of updates steps to accumulate
21
+ 'gradient_checkpointing': True, # save memory but slower backward pass
22
+ 'bf16': True, # use bfloat16 precision
23
+ 'tf32': True, # use tf32 precision
24
+ 'learning_rate': 2e-4, # learning rate
25
+ 'max_grad_norm': 0.3, # Maximum norm (for gradient clipping)
26
+ 'warmup_ratio': 0.03, # warmup ratio
27
+ "lr_scheduler_type":"constant", # learning rate scheduler
28
+ 'save_strategy': "epoch", # save strategy for checkpoints
29
+ "logging_steps": 10, # log every x steps
30
+ 'merge_adapters': True, # wether to merge LoRA into the model (needs more memory)
31
+ 'use_flash_attn': True, # Whether to use Flash Attention
32
+ }
33
+
34
+ ```