lesso12's picture
End of training
e37b6db verified
|
raw
history blame
2.31 kB
metadata
library_name: peft
base_model: NousResearch/CodeLlama-13b-hf-flash
tags:
  - axolotl
  - generated_from_trainer
model-index:
  - name: ef1d817c-d307-4432-be40-8d62c1ccceed
    results: []

Built with Axolotl

ef1d817c-d307-4432-be40-8d62c1ccceed

This model is a fine-tuned version of NousResearch/CodeLlama-13b-hf-flash on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2801

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.000212
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 120
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss
No log 0.0000 1 2.2917
3.2582 0.0024 50 1.5773
3.0859 0.0047 100 1.4720
3.1501 0.0071 150 1.5418
3.0678 0.0094 200 1.5444
2.8509 0.0118 250 1.4362
2.949 0.0141 300 1.3197
2.7131 0.0165 350 1.2930
2.8697 0.0188 400 1.2848
2.7921 0.0212 450 1.2807
2.788 0.0235 500 1.2801

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1