|
--- |
|
library_name: peft |
|
license: apache-2.0 |
|
datasets: |
|
- adriantheuma/raven-data |
|
language: |
|
- en |
|
--- |
|
### Training details |
|
|
|
* Prompt tokenisation: [LlamaTokenizer](https://huggingface.co/docs/transformers/model_doc/llama2#transformers.LlamaTokenizer). |
|
* Maximum context length: 1,204 tokens |
|
* Per device train batch: 1 |
|
* Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128) |
|
* Quantisation: 8-bit |
|
* Optimiser: adamw |
|
* Learning_rate: 3 × 10−4 |
|
* warmup_steps: 100 |
|
* epochs: 5 |
|
|
|
* Low Rank Adaptation (LoRA) |
|
* rank: 16 |
|
* alpha: 16 |
|
* dropout: 0.05 |
|
* target modules: q_proj, k_proj, v_proj, and o_proj |
|
|
|
This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base [Llama 2 13B Chat](https://huggingface.co/docs/transformers/model_doc/llama2) model. |
|
|
|
### Training hardware |
|
|
|
This model is trained on commodity hardware equipped with a: |
|
* 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz |
|
* 64 GB installed RAM |
|
* NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM. |
|
|
|
The trained model consumed 100 GPU hours during training. |