train_qqp_42_1779354536

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the qqp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0971
  • Num Input Tokens Seen: 27589664

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2949 0.0500 2047 0.1455 1392320
0.1591 0.1000 4094 0.1452 2766976
0.194 0.1500 6141 0.1621 4153792
0.0573 0.2000 8188 0.1290 5528512
0.2024 0.2500 10235 0.1197 6911360
0.0685 0.3001 12282 0.1198 8281152
0.1171 0.3501 14329 0.1189 9655616
0.0261 0.4001 16376 0.1129 11025600
0.0561 0.4501 18423 0.1114 12395840
0.1968 0.5001 20470 0.1074 13782144
0.0355 0.5501 22517 0.1015 15155072
0.0953 0.6001 24564 0.0971 16541888
0.0205 0.6501 26611 0.1038 17928960
0.0953 0.7001 28658 0.1138 19303296
0.1044 0.7501 30705 0.1024 20683008
0.0655 0.8001 32752 0.0997 22064384
0.0883 0.8501 34799 0.1071 23443136
0.0633 0.9002 36846 0.1085 24829568
0.045 0.9502 38893 0.1052 26221120

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
248
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
Input a message to start chatting with rbelanec/train_qqp_42_1779354536.

Model tree for rbelanec/train_qqp_42_1779354536

Finetuned
(1747)
this model