--- library_name: peft license: apache-2.0 datasets: - adriantheuma/raven-data language: - en --- ### Training details * Prompt tokenisation: [LlamaTokenizer](https://huggingface.co/docs/transformers/model_doc/llama2#transformers.LlamaTokenizer). * Maximum context length: 1,204 tokens * Per device train batch: 1 * Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128) * Quantisation: 8-bit * Optimiser: adamw * Learning_rate: 3 × 10−4 * warmup_steps: 100 * epochs: 5 * Low Rank Adaptation (LoRA) * rank: 16 * alpha: 16 * dropout: 0.05 * target modules: q_proj, k_proj, v_proj, and o_proj This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base [Llama 2 13B Chat](https://huggingface.co/docs/transformers/model_doc/llama2) model. ### Training hardware This model is trained on commodity hardware equipped with a: * 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz * 64 GB installed RAM * NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM. The trained model consumed 100 GPU hours during training.