empty-michael's picture
End of training
dd10feb verified
metadata
base_model: roneneldan/TinyStories-1Layer-21M
tags:
  - generated_from_trainer
datasets:
  - roneneldan/TinyStories
metrics:
  - accuracy
model-index:
  - name: tinystories_1layer_attn_mlp_C10k_k16
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: roneneldan/TinyStories
          type: roneneldan/TinyStories
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.5091345939349958

tinystories_1layer_attn_mlp_C10k_k16

This model is a fine-tuned version of roneneldan/TinyStories-1Layer-21M on the roneneldan/TinyStories dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1329
  • Accuracy: 0.5091
  • Multicode K: 1
  • Dead Code Fraction/layer0: 0.1880
  • Mse/layer0: 604.5097
  • Input Norm/layer0: 31.9987
  • Output Norm/layer0: 19.3897

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 96
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 10000

Training results

Training Loss Epoch Step Validation Loss Accuracy Multicode K Dead Code Fraction/layer0 Mse/layer0 Input Norm/layer0 Output Norm/layer0
3.0494 0.05 500 2.9927 0.4177 1 0.0 805.1676 31.9986 10.3600
2.6986 0.1 1000 2.7080 0.4472 1 0.0084 739.3244 31.9985 12.7165
2.5145 0.15 1500 2.5252 0.4637 1 0.0546 697.1179 31.9984 14.4889
2.4197 0.2 2000 2.4093 0.4758 1 0.0988 670.0254 31.9983 15.7288
2.3541 0.25 2500 2.3404 0.4837 1 0.1337 651.1297 31.9983 16.6602
2.2742 0.3 3000 2.2907 0.4903 1 0.1499 642.6360 31.9983 17.3243
2.2488 0.35 3500 2.2565 0.4945 1 0.1575 640.3158 31.9983 17.7566
2.2287 0.4 4000 2.2333 0.4967 1 0.1613 638.8423 31.9983 18.0223
2.2576 0.45 4500 2.2155 0.4992 1 0.1676 639.7464 31.9983 18.1919
2.1901 1.02 5000 2.2026 0.5014 1 0.1696 638.1766 31.9984 18.3119
2.1686 1.07 5500 2.1935 0.5026 1 0.1716 638.6084 31.9984 18.4013
2.2158 1.12 6000 2.1833 0.5037 1 0.1779 632.9326 31.9985 18.5149
2.1843 1.17 6500 2.1760 0.5039 1 0.1797 631.2925 31.9985 18.5986
2.1339 1.22 7000 2.1696 0.5048 1 0.1819 627.9791 31.9985 18.7053
2.187 1.27 7500 2.1584 0.5063 1 0.1867 622.1227 31.9986 18.8338
2.1302 1.32 8000 2.1508 0.5071 1 0.1875 617.7162 31.9986 18.9493
2.1471 1.37 8500 2.1444 0.5082 1 0.1885 613.7248 31.9986 19.0666
2.1556 1.42 9000 2.1392 0.5087 1 0.1880 610.3757 31.9987 19.1817
2.1067 1.47 9500 2.1351 0.5091 1 0.1875 608.6866 31.9987 19.2836
2.1536 2.04 10000 2.1329 0.5091 1 0.1880 604.5097 31.9987 19.3897

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.1