train_mnli_42_1779286677

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the mnli dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1007
  • Num Input Tokens Seen: 38288928

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1805 0.0500 2209 0.1564 1902848
0.1328 0.1000 4418 0.1287 3830208
0.1111 0.1500 6627 0.1695 5746816
0.0485 0.2000 8836 0.1159 7653120
0.1612 0.2500 11045 0.1187 9565376
0.0932 0.3000 13254 0.1261 11476480
0.0607 0.3500 15463 0.1186 13403072
0.065 0.4000 17672 0.1276 15322176
0.1503 0.4500 19881 0.1237 17253696
0.0817 0.5000 22090 0.1152 19168640
0.1078 0.5500 24299 0.1187 21061760
0.094 0.6000 26508 0.1096 22967296
0.1425 0.6500 28717 0.1052 24888448
0.0465 0.7000 30926 0.1023 26807552
0.1499 0.7500 33135 0.1029 28714304
0.1164 0.8000 35344 0.1083 30627136
0.0674 0.8500 37553 0.1007 32537600
0.0278 0.9000 39762 0.1026 34457664
0.0646 0.9500 41971 0.1025 36377344

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
262
Safetensors
Model size
1B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
Input a message to start chatting with rbelanec/train_mnli_42_1779286677.

Model tree for rbelanec/train_mnli_42_1779286677

Finetuned
(1748)
this model