Edit model card

llama3.2-3b-hard

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0052

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • num_epochs: 20
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.6035 0.5202 100 2.2469
2.412 1.0403 200 2.1836
2.3523 1.5605 300 2.1436
2.3063 2.0806 400 2.1116
2.24 2.6008 500 2.0822
2.2205 3.1209 600 2.0610
2.169 3.6411 700 2.0429
2.1232 4.1612 800 2.0338
2.1088 4.6814 900 2.0237
2.0885 5.2016 1000 2.0192
2.0604 5.7217 1100 2.0126
2.0353 6.2419 1200 2.0069
1.9994 6.7620 1300 2.0035
1.9972 7.2822 1400 2.0057
1.9674 7.8023 1500 1.9955
1.9455 8.3225 1600 2.0008
1.9392 8.8427 1700 2.0010
1.9339 9.3628 1800 2.0055
1.9034 9.8830 1900 1.9982
1.8877 10.4031 2000 2.0052

Framework versions

  • PEFT 0.12.0
  • Transformers 4.45.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.1
Downloads last month
6
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Jsoo/llama3.2-3b-hard

Adapter
(54)
this model