Edit model card

qdora

This model is a fine-tuned version of IlyaGusev/saiga_llama3_8b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5287

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
1.794 0.1389 25 1.6986
1.6478 0.2778 50 1.6411
1.5549 0.4167 75 1.5937
1.4962 0.5556 100 1.5652
1.4841 0.6944 125 1.5430
1.563 0.8333 150 1.5320
1.51 0.9722 175 1.5287

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for jester20/saiga_llama3_sum_8bit_qdora_v02

Adapter
(10)
this model