empty-michael's picture
End of training
8d4fef0 verified
metadata
base_model: roneneldan/TinyStories-1Layer-21M
tags:
  - generated_from_trainer
datasets:
  - roneneldan/TinyStories
metrics:
  - accuracy
model-index:
  - name: tinystories_1layer_attn_mlp_C25k_k16_mse_weighted
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: roneneldan/TinyStories
          type: roneneldan/TinyStories
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.5193506309245984

tinystories_1layer_attn_mlp_C25k_k16_mse_weighted

This model is a fine-tuned version of roneneldan/TinyStories-1Layer-21M on the roneneldan/TinyStories dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0353
  • Accuracy: 0.5194
  • Multicode K: 1
  • Dead Code Fraction/layer0: 0.1640
  • Mse/layer0: 501.8128
  • Input Norm/layer0: 31.9989
  • Output Norm/layer0: 22.8009

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.005
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 96
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 10000

Training results

Training Loss Epoch Step Validation Loss Accuracy Multicode K Dead Code Fraction/layer0 Mse/layer0 Input Norm/layer0 Output Norm/layer0
2.8364 0.05 500 2.7649 0.4227 1 0.3619 634.8932 31.9979 18.0819
2.3611 0.1 1000 2.3705 0.4712 1 0.3607 568.7264 31.9979 20.6630
2.2395 0.15 1500 2.2531 0.4866 1 0.3266 550.3311 31.9979 21.3297
2.1999 0.2 2000 2.1908 0.4955 1 0.3048 539.0150 31.9980 21.7663
2.1688 0.25 2500 2.1551 0.5006 1 0.2949 530.4651 31.9980 22.0228
2.1108 0.3 3000 2.1269 0.5051 1 0.2809 524.9530 31.9981 22.2071
2.1045 0.35 3500 2.1130 0.5079 1 0.2735 523.0844 31.9982 22.3519
2.0944 0.4 4000 2.0996 0.5089 1 0.2655 519.8852 31.9983 22.3930
2.1314 0.45 4500 2.0860 0.5115 1 0.2567 517.0385 31.9983 22.4720
2.0685 1.02 5000 2.0770 0.5131 1 0.2497 514.3712 31.9984 22.4943
2.0496 1.07 5500 2.0730 0.5137 1 0.2381 513.7823 31.9985 22.5625
2.1002 1.12 6000 2.0667 0.5144 1 0.2305 510.7876 31.9986 22.5882
2.0723 1.17 6500 2.0632 0.5148 1 0.2206 510.5624 31.9986 22.6133
2.023 1.22 7000 2.0574 0.5157 1 0.2110 509.9878 31.9987 22.6544
2.0791 1.27 7500 2.0513 0.5168 1 0.2033 507.1514 31.9987 22.7018
2.0252 1.32 8000 2.0463 0.5173 1 0.1953 505.2723 31.9988 22.7108
2.0432 1.37 8500 2.0423 0.5183 1 0.1875 502.9395 31.9988 22.7562
2.0549 1.42 9000 2.0394 0.5188 1 0.1797 502.9016 31.9988 22.7722
2.0087 1.47 9500 2.0365 0.5193 1 0.1704 504.0088 31.9989 22.7990
2.0569 2.04 10000 2.0353 0.5194 1 0.1640 501.8128 31.9989 22.8009

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.1