Edit model card

MiniLMv2-L6-H384_R-fineweb-100k

This is a MiniLMv2 model continually pre-trained on an MLM task with the goal of improving downstream fine-tuning/performance:

  • activation updated to SiLU prior to further training
  • MLM @ 40% mask ratio

Model description

This model is a fine-tuned version of nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large on the BEE-spoke-data/fineweb-100k_en-med dataset.

It achieves the following results on the evaluation set:

  • Loss: 4.0206
  • Accuracy: 0.3783
  • Num Input Tokens Seen: 162790400

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 1792
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
4.6583 0.1208 150 4.5052 0.3406 9830400
4.5365 0.2415 300 4.3712 0.3525 19660800
4.4621 0.3623 450 4.2810 0.3575 29491200
4.4116 0.4831 600 4.2466 0.3615 39321600
4.3487 0.6038 750 4.1795 0.3661 49152000
4.338 0.7246 900 4.1874 0.3663 58982400
4.342 0.8454 1050 4.1475 0.3695 68812800
4.268 0.9661 1200 4.1215 0.3714 78643200
4.2185 1.0869 1350 4.1032 0.3725 88472576
4.2645 1.2077 1500 4.0859 0.3757 98302976
4.2542 1.3284 1650 4.0730 0.3750 108133376
4.2614 1.4492 1800 4.0682 0.3749 117963776
4.1928 1.5700 1950 4.0596 0.3758 127794176
4.1971 1.6907 2100 4.0505 0.3777 137624576
4.1966 1.8115 2250 4.0163 0.3787 147454976
4.16 1.9323 2400 4.0352 0.3774 157285376

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu118
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
16
Safetensors
Model size
30.3M params
Tensor type
F32
·

Dataset used to train pszemraj/MiniLMv2-L6-H384_R-fineweb-100k