llama3-70B-lora-pretrain_v2
This model is a fine-tuned version of meta-llama/Meta-Llama-3-70B-Instruct on the sm_artile dataset.
It achieves the following results on the evaluation set:
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- total_eval_batch_size: 2
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 500
- num_epochs: 3.0
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
2.6995 |
0.0939 |
100 |
2.6305 |
2.4199 |
0.1877 |
200 |
2.3979 |
2.2722 |
0.2816 |
300 |
2.2180 |
2.0762 |
0.3754 |
400 |
2.1251 |
1.9652 |
0.4693 |
500 |
2.0858 |
2.1893 |
0.5631 |
600 |
2.0629 |
2.0153 |
0.6570 |
700 |
2.0473 |
1.9911 |
0.7508 |
800 |
2.0318 |
2.1041 |
0.8447 |
900 |
2.0198 |
2.0488 |
0.9385 |
1000 |
2.0117 |
1.897 |
1.0324 |
1100 |
2.0018 |
2.0298 |
1.1262 |
1200 |
1.9952 |
2.0989 |
1.2201 |
1300 |
1.9890 |
1.8695 |
1.3139 |
1400 |
1.9838 |
2.1573 |
1.4078 |
1500 |
1.9764 |
2.0183 |
1.5016 |
1600 |
1.9713 |
1.9229 |
1.5955 |
1700 |
1.9672 |
1.9732 |
1.6893 |
1800 |
1.9617 |
1.6835 |
1.7832 |
1900 |
1.9574 |
1.9874 |
1.8771 |
2000 |
1.9539 |
1.7607 |
1.9709 |
2100 |
1.9512 |
1.9459 |
2.0648 |
2200 |
1.9480 |
1.7611 |
2.1586 |
2300 |
1.9463 |
1.8491 |
2.2525 |
2400 |
1.9441 |
1.9121 |
2.3463 |
2500 |
1.9427 |
1.8849 |
2.4402 |
2600 |
1.9413 |
2.0679 |
2.5340 |
2700 |
1.9400 |
1.9908 |
2.6279 |
2800 |
1.9394 |
1.9557 |
2.7217 |
2900 |
1.9388 |
1.9627 |
2.8156 |
3000 |
1.9384 |
1.8339 |
2.9094 |
3100 |
1.9383 |
Framework versions
- PEFT 0.10.0
- Transformers 4.40.0
- Pytorch 2.2.1
- Datasets 2.18.0
- Tokenizers 0.19.1