roberta-wiki-en

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2966

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-06
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 12500.0
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
1.5933 0.0928 12500 1.4776
1.6391 0.1856 25000 1.5202
1.6551 0.2783 37500 1.5291
1.6398 0.3711 50000 1.5364
1.6429 0.4639 62500 1.5345
1.6354 0.5567 75000 1.5338
1.629 0.6495 87500 1.5325
1.6457 0.7423 100000 1.5285
1.6514 0.8350 112500 1.5377
1.5955 0.9278 125000 1.5234
1.616 1.0206 137500 1.5196
1.5456 2.2268 150000 1.4437
1.5265 2.4124 162500 1.4288
1.514 2.5979 175000 1.4139
1.5114 2.7835 187500 1.4059
1.4989 2.9691 200000 1.4008
1.4962 3.1546 212500 1.3926
1.481 3.3402 225000 1.3850
1.469 3.5258 237500 1.3777
1.4654 3.7113 250000 1.3689
1.463 3.8969 262500 1.3652
1.4546 4.0825 275000 1.3575
1.4436 4.2680 287500 1.3489
1.4312 4.4536 300000 1.3441
1.4312 4.6392 312500 1.3359
1.4204 4.8247 325000 1.3272
1.4138 5.0103 337500 1.3228
1.4096 5.1959 350000 1.3168
1.4162 5.3814 362500 1.3108
1.4005 5.5670 375000 1.3048
1.3965 5.7526 387500 1.2997
1.3802 5.9381 400000 1.2966

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.