smolm-autoreg-bpe-counterfactual-babylm-only_random_removal-seed_211-1e-3
This model was trained from scratch on the kanishka/counterfactual-babylm-only_random_removal dataset. It achieves the following results on the evaluation set:
- Loss: 3.3755
- Accuracy: 0.4148
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 64
- seed: 211
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 32000
- num_epochs: 20.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
3.6032 | 1.0 | 18588 | 3.7897 | 0.3587 |
3.3863 | 2.0 | 37176 | 3.5718 | 0.3804 |
3.2604 | 3.0 | 55764 | 3.4372 | 0.3939 |
3.1796 | 4.0 | 74352 | 3.4083 | 0.3998 |
3.1262 | 5.0 | 92940 | 3.3492 | 0.4057 |
3.0821 | 6.0 | 111528 | 3.3454 | 0.4090 |
3.0487 | 7.0 | 130116 | 3.3345 | 0.4094 |
3.0128 | 8.0 | 148704 | 3.3277 | 0.4115 |
2.9873 | 9.0 | 167292 | 3.3305 | 0.4121 |
2.9539 | 10.0 | 185880 | 3.3189 | 0.4134 |
2.9369 | 11.0 | 204468 | 3.3453 | 0.4131 |
2.9143 | 12.0 | 223056 | 3.3310 | 0.4134 |
2.897 | 13.0 | 241644 | 3.3251 | 0.4152 |
2.8756 | 14.0 | 260232 | 3.3490 | 0.4136 |
2.8543 | 15.0 | 278820 | 3.3548 | 0.4146 |
2.8345 | 16.0 | 297408 | 3.3477 | 0.4150 |
2.8109 | 17.0 | 315996 | 3.3563 | 0.4150 |
2.7959 | 18.0 | 334584 | 3.3613 | 0.4152 |
2.7806 | 19.0 | 353172 | 3.3693 | 0.4150 |
2.7578 | 20.0 | 371760 | 3.3755 | 0.4148 |
Framework versions
- Transformers 4.37.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.1
- Downloads last month
- 9
Dataset used to train kanishka/smolm-autoreg-bpe-counterfactual-babylm-only_random_removal-seed_211-1e-3
Evaluation results
- Accuracy on kanishka/counterfactual-babylm-only_random_removalself-reported0.415