Edit model card

llama3-70B-lora-pretrain_v2

This model is a fine-tuned version of meta-llama/Meta-Llama-3-70B-Instruct on the sm_artile dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9382

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 2
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 3.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.6995 0.0939 100 2.6305
2.4199 0.1877 200 2.3979
2.2722 0.2816 300 2.2180
2.0762 0.3754 400 2.1251
1.9652 0.4693 500 2.0858
2.1893 0.5631 600 2.0629
2.0153 0.6570 700 2.0473
1.9911 0.7508 800 2.0318
2.1041 0.8447 900 2.0198
2.0488 0.9385 1000 2.0117
1.897 1.0324 1100 2.0018
2.0298 1.1262 1200 1.9952
2.0989 1.2201 1300 1.9890
1.8695 1.3139 1400 1.9838
2.1573 1.4078 1500 1.9764
2.0183 1.5016 1600 1.9713
1.9229 1.5955 1700 1.9672
1.9732 1.6893 1800 1.9617
1.6835 1.7832 1900 1.9574
1.9874 1.8771 2000 1.9539
1.7607 1.9709 2100 1.9512
1.9459 2.0648 2200 1.9480
1.7611 2.1586 2300 1.9463
1.8491 2.2525 2400 1.9441
1.9121 2.3463 2500 1.9427
1.8849 2.4402 2600 1.9413
2.0679 2.5340 2700 1.9400
1.9908 2.6279 2800 1.9394
1.9557 2.7217 2900 1.9388
1.9627 2.8156 3000 1.9384
1.8339 2.9094 3100 1.9383

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.0
  • Pytorch 2.2.1
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .

Adapter for