train_cola_42_1776331560

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1763
  • Num Input Tokens Seen: 1932608

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2021 0.2505 241 0.2780 97664
0.2402 0.5010 482 0.2002 194560
0.1906 0.7516 723 0.2094 291712
0.2397 1.0021 964 0.1763 387464
0.0622 1.2526 1205 0.2676 485192
0.0911 1.5031 1446 0.3146 581704
0.1042 1.7536 1687 0.2114 677576
0.096 2.0042 1928 0.3562 775312
0.0094 2.2547 2169 0.3035 873104
0.0894 2.5052 2410 0.3649 969360
0.0705 2.7557 2651 0.3061 1065232
0.0016 3.0062 2892 0.2698 1162016
0.0469 3.2568 3133 0.3603 1259168
0.0682 3.5073 3374 0.4128 1355552
0.0128 3.7578 3615 0.3697 1453088
0.0238 4.0083 3856 0.3716 1549360
0.0 4.2588 4097 0.4492 1645808
0.0202 4.5094 4338 0.4368 1742960
0.0001 4.7599 4579 0.4381 1839344

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
6
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
Input a message to start chatting with rbelanec/train_cola_42_1776331560.

Model tree for rbelanec/train_cola_42_1776331560

Finetuned
(1749)
this model