PEFT
English
Edit model card

Training details

  • Prompt tokenisation: LlamaTokenizer.

  • Maximum context length: 1,204 tokens

  • Per device train batch: 1

  • Gradient accumulation: 128 steps (achieving the equivalent batch_size of 128)

  • Quantisation: 8-bit

  • Optimiser: adamw

  • Learning_rate: 3 × 10−4

  • warmup_steps: 100

  • epochs: 5

  • Low Rank Adaptation (LoRA)

    • rank: 16
    • alpha: 16
    • dropout: 0.05
    • target modules: q_proj, k_proj, v_proj, and o_proj

This setup reduces the trainable parameters to 26,214,400 or 0.2% of the base Llama 2 13B Chat model.

Training hardware

This model is trained on commodity hardware equipped with a:

  • 13th Gen Intel(R) Core(TM) i7-13700KF CPU at 3.40 GHz
  • 64 GB installed RAM
  • NVIDIA GeForce RTX 4090 GPU with 24 GB onboard RAM.

The trained model consumed 100 GPU hours during training.

Downloads last month
2
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train adriantheuma/raven-lora