kanishka's picture
End of training
7bafc0d verified
metadata
library_name: transformers
tags:
  - generated_from_trainer
datasets:
  - kanishka/babylm2-rewritten-clean-spacy_no-num-adj
metrics:
  - accuracy
model-index:
  - name: opt-babylm2-rewritten-clean-spacy_no-num-adj-earlystop-bpe_seed-42_1e-3
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: kanishka/babylm2-rewritten-clean-spacy_no-num-adj
          type: kanishka/babylm2-rewritten-clean-spacy_no-num-adj
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.4781093360218181

opt-babylm2-rewritten-clean-spacy_no-num-adj-earlystop-bpe_seed-42_1e-3

This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy_no-num-adj dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6900
  • Accuracy: 0.4781

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 32000
  • num_epochs: 20.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
4.0896 1.0 2225 3.8307 0.3593
3.4325 2.0 4450 3.3145 0.4081
3.1208 3.0 6675 3.1044 0.4295
2.957 4.0 8900 2.9973 0.4396
2.8381 5.0 11125 2.9338 0.4464
2.7819 6.0 13350 2.8904 0.4508
2.7385 7.0 15575 2.8666 0.4531
2.7061 8.0 17800 2.8456 0.4559
2.6855 9.0 20025 2.8332 0.4575
2.6669 10.0 22250 2.8198 0.4586
2.6499 11.0 24475 2.8118 0.4597
2.6351 12.0 26700 2.8072 0.4601
2.6204 13.0 28925 2.8026 0.4612
2.6277 14.0 31150 2.8013 0.4613
2.6136 15.0 33375 2.7791 0.4638
2.5687 16.0 35600 2.7514 0.4676
2.5184 17.0 37825 2.7283 0.4708
2.4613 18.0 40050 2.7060 0.4740
2.3966 19.0 42275 2.6947 0.4766
2.3227 19.9913 44480 2.6900 0.4781

Framework versions

  • Transformers 4.48.0
  • Pytorch 2.5.1
  • Datasets 3.2.0
  • Tokenizers 0.21.0