train_mnli_42_1779286678

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the mnli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1017
  • Num Input Tokens Seen: 38288928

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.177 0.0500 2209 0.1569 1902848
0.1322 0.1000 4418 0.1295 3830208
0.1141 0.1500 6627 0.1696 5746816
0.051 0.2000 8836 0.1156 7653120
0.1554 0.2500 11045 0.1194 9565376
0.0944 0.3000 13254 0.1249 11476480
0.0612 0.3500 15463 0.1199 13403072
0.0695 0.4000 17672 0.1282 15322176
0.1525 0.4500 19881 0.1226 17253696
0.0825 0.5000 22090 0.1149 19168640
0.1001 0.5500 24299 0.1184 21061760
0.0926 0.6000 26508 0.1101 22967296
0.1476 0.6500 28717 0.1062 24888448
0.0485 0.7000 30926 0.1028 26807552
0.1511 0.7500 33135 0.1031 28714304
0.1151 0.8000 35344 0.1086 30627136
0.0676 0.8500 37553 0.1017 32537600
0.0265 0.9000 39762 0.1034 34457664
0.0689 0.9500 41971 0.1031 36377344

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
259
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
Input a message to start chatting with rbelanec/train_mnli_42_1779286678.

Model tree for rbelanec/train_mnli_42_1779286678

Finetuned
(1747)
this model