childes_clm_context_13

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100000
  • training_steps: 400000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
5.5156 1.2694 4000 4.1587
3.9334 2.5389 8000 3.6747
3.5766 3.8083 12000 3.4190
3.3645 5.0778 16000 3.2713
3.2159 6.3472 20000 3.1943
3.1162 7.6166 24000 3.1435
3.0495 8.8861 28000 3.1004
2.9877 10.1555 32000 3.0966
2.9435 11.4249 36000 3.0851
2.9121 12.6944 40000 3.0755
2.8868 13.9638 44000 3.0624
2.8462 15.2333 48000 3.0738
2.8317 16.5027 52000 3.0683
2.8171 17.7721 56000 3.0640

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
0
Safetensors
Model size
8.55M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.