gokuls's picture
End of training
561d147
metadata
license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - wikitext
metrics:
  - accuracy
model-index:
  - name: mobilebert_sa_pre-training-complete
    results:
      - task:
          name: Masked Language Modeling
          type: fill-mask
        dataset:
          name: wikitext wikitext-103-raw-v1
          type: wikitext
          config: wikitext-103-raw-v1
          split: validation
          args: wikitext-103-raw-v1
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.7161816392520737

mobilebert_sa_pre-training-complete

This model is a fine-tuned version of google/mobilebert-uncased on the wikitext wikitext-103-raw-v1 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3239
  • Accuracy: 0.7162

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 300000

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.6028 1.0 7145 1.4525 0.6935
1.5524 2.0 14290 1.4375 0.6993
1.5323 3.0 21435 1.4194 0.6993
1.5191 4.0 28580 1.4110 0.7027
1.5025 5.0 35725 1.4168 0.7014
1.4902 6.0 42870 1.3931 0.7012
1.4813 7.0 50015 1.3738 0.7057
1.4751 8.0 57160 1.4237 0.6996
1.4689 9.0 64305 1.3969 0.7047
1.4626 10.0 71450 1.3916 0.7068
1.4566 11.0 78595 1.3686 0.7072
1.451 12.0 85740 1.3811 0.7060
1.4478 13.0 92885 1.3598 0.7092
1.4441 14.0 100030 1.3790 0.7054
1.4379 15.0 107175 1.3794 0.7066
1.4353 16.0 114320 1.3609 0.7102
1.43 17.0 121465 1.3685 0.7083
1.4278 18.0 128610 1.3953 0.7036
1.4219 19.0 135755 1.3756 0.7085
1.4197 20.0 142900 1.3597 0.7090
1.4169 21.0 150045 1.3673 0.7061
1.4146 22.0 157190 1.3753 0.7073
1.4109 23.0 164335 1.3696 0.7082
1.4073 24.0 171480 1.3563 0.7092
1.4054 25.0 178625 1.3712 0.7103
1.402 26.0 185770 1.3528 0.7113
1.4001 27.0 192915 1.3367 0.7123
1.397 28.0 200060 1.3508 0.7118
1.3955 29.0 207205 1.3572 0.7117
1.3937 30.0 214350 1.3566 0.7095
1.3901 31.0 221495 1.3515 0.7117
1.3874 32.0 228640 1.3445 0.7118
1.386 33.0 235785 1.3611 0.7097
1.3833 34.0 242930 1.3502 0.7087
1.3822 35.0 250075 1.3657 0.7108
1.3797 36.0 257220 1.3576 0.7108
1.3793 37.0 264365 1.3472 0.7106
1.3763 38.0 271510 1.3323 0.7156
1.3762 39.0 278655 1.3325 0.7145
1.3748 40.0 285800 1.3243 0.7138
1.3733 41.0 292945 1.3218 0.7170
1.3722 41.99 300000 1.3074 0.7186

Framework versions

  • Transformers 4.26.0
  • Pytorch 1.14.0a0+410ce96
  • Datasets 2.9.0
  • Tokenizers 0.13.2