adriantheuma
/

raven-lora

Model card Files Files and versions Community

Training details

Prompt tokenisation: LlamaTokenizer.
Maximum context length: 1,204 tokens
Per device train batch: 1
Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128)
Quantisation: 8-bit
Optimiser: adamw
Learning_rate: 3 × 10−4
warmup_steps: 100
epochs: 5
Low Rank Adaptation (LoRA)
- rank: 16
- alpha: 16
- dropout: 0.05
- target modules: q_proj, k_proj, v_proj, and o_proj

This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base Llama 2 13B Chat model.

Training hardware

This model is trained on commodity hardware equipped with a:

13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz
64 GB installed RAM
NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM.

The trained model consumed 100 GPU hours during training.

Downloads last month: 4

Inference API

Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train adriantheuma/raven-lora