Training details
Prompt tokenisation: LlamaTokenizer.
Maximum context length: 1,204 tokens
Per device train batch: 1
Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128)
Quantisation: 8-bit
Optimiser: adamw
Learning_rate: 3 × 10−4
warmup_steps: 100
epochs: 5
Low Rank Adaptation (LoRA)
- rank: 16
- alpha: 16
- dropout: 0.05
- target modules: q_proj, k_proj, v_proj, and o_proj
This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base Llama 2 13B Chat model.
Training hardware
This model is trained on commodity hardware equipped with a:
- 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz
- 64 GB installed RAM
- NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM.
The trained model consumed 100 GPU hours during training.
- Downloads last month
- 5