train_qqp_42_1779354535

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the qqp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0963
  • Num Input Tokens Seen: 27589664

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.289 0.0500 2047 0.1431 1392320
0.1537 0.1000 4094 0.1419 2766976
0.1852 0.1500 6141 0.1663 4153792
0.062 0.2000 8188 0.1276 5528512
0.2056 0.2500 10235 0.1221 6911360
0.0732 0.3001 12282 0.1236 8281152
0.0736 0.3501 14329 0.1139 9655616
0.0462 0.4001 16376 0.1089 11025600
0.0615 0.4501 18423 0.1117 12395840
0.2073 0.5001 20470 0.1076 13782144
0.0438 0.5501 22517 0.1019 15155072
0.0881 0.6001 24564 0.0963 16541888
0.0308 0.6501 26611 0.1020 17928960
0.1032 0.7001 28658 0.1099 19303296
0.0953 0.7501 30705 0.1009 20683008
0.0594 0.8001 32752 0.0975 22064384
0.0754 0.8501 34799 0.1061 23443136
0.0723 0.9002 36846 0.1065 24829568
0.0365 0.9502 38893 0.1033 26221120

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
233
Safetensors
Model size
1B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
Input a message to start chatting with rbelanec/train_qqp_42_1779354535.

Model tree for rbelanec/train_qqp_42_1779354535

Finetuned
(1747)
this model