train_qqp_42_1779207273

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the qqp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1128
  • Num Input Tokens Seen: 137941664

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1388 0.2500 10234 0.1504 6910656
0.1021 0.5000 20468 0.1128 13780928
0.0459 0.7501 30702 0.1215 20680640
0.1214 1.0001 40936 0.1224 27591776
0.1395 1.2501 51170 0.1528 34492320
0.0425 1.5001 61404 0.1524 41393504
0.1097 1.7501 71638 0.1231 48287456
0.0019 2.0001 81872 0.1237 55178600
0.0001 2.2502 92106 0.2041 62093992
0.0236 2.5002 102340 0.1835 68988456
0.0008 2.7502 112574 0.2039 75874280
0.0003 3.0002 122808 0.1936 82772304
0.0 3.2502 133042 0.2610 89675984
0.0332 3.5003 143276 0.2494 96560720
0.0 3.7503 153510 0.2414 103465808
0.0 4.0003 163744 0.2473 110357352
0.0 4.2503 173978 0.3375 117230952
0.0 4.5003 184212 0.3128 124100264
0.0727 4.7503 194446 0.3178 131030440

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
308
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
Input a message to start chatting with rbelanec/train_qqp_42_1779207273.

Model tree for rbelanec/train_qqp_42_1779207273

Finetuned
(1747)
this model