Edit model card

tinystories_1layer_attn_mlp_C10k_k100

This model is a fine-tuned version of roneneldan/TinyStories-1Layer-21M on the roneneldan/TinyStories dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8957
  • Accuracy: 0.5429
  • Multicode K: 1
  • Dead Code Fraction/layer0: 0.0
  • Mse/layer0: 611.1572
  • Input Norm/layer0: 31.9975
  • Output Norm/layer0: 15.0872

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 48
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 10000

Training results

Training Loss Epoch Step Validation Loss Accuracy Multicode K Dead Code Fraction/layer0 Mse/layer0 Input Norm/layer0 Output Norm/layer0
2.5072 0.05 500 2.4764 0.4579 1 0.0 841.1602 31.9977 4.9114
2.2285 0.1 1000 2.2265 0.4926 1 0.0 792.3023 31.9980 7.5524
2.1472 0.16 1500 2.1584 0.5025 1 0.0 761.8683 31.9980 8.9239
2.1144 0.21 2000 2.1128 0.5090 1 0.0 737.1843 31.9979 9.8992
2.0847 0.26 2500 2.0791 0.5142 1 0.0 716.9390 31.9979 10.6577
2.0439 0.31 3000 2.0482 0.5185 1 0.0 698.7266 31.9979 11.3599
2.0263 0.37 3500 2.0253 0.5224 1 0.0 682.2680 31.9979 12.0105
1.9906 0.42 4000 2.0066 0.5253 1 0.0 669.1965 31.9979 12.5568
1.9852 0.47 4500 1.9898 0.5279 1 0.0 657.5872 31.9979 13.0526
1.9687 0.52 5000 1.9757 0.5300 1 0.0 648.2462 31.9979 13.4496
1.9672 0.57 5500 1.9620 0.5321 1 0.0 640.0822 31.9978 13.8078
1.9441 0.63 6000 1.9513 0.5339 1 0.0 633.8831 31.9978 14.1018
1.9408 0.68 6500 1.9397 0.5358 1 0.0 628.0929 31.9977 14.3550
1.9256 0.73 7000 1.9302 0.5374 1 0.0 623.2726 31.9977 14.5534
1.9204 0.78 7500 1.9225 0.5381 1 0.0 619.4573 31.9977 14.7258
1.907 0.84 8000 1.9150 0.5393 1 0.0 616.4379 31.9976 14.8625
1.8931 0.89 8500 1.9076 0.5408 1 0.0 613.7874 31.9976 14.9685
1.9021 0.94 9000 1.9021 0.5417 1 0.0 612.0126 31.9975 15.0379
1.8967 0.99 9500 1.8970 0.5426 1 0.0 610.6121 31.9975 15.0932
1.8942 1.04 10000 1.8957 0.5429 1 0.0 611.1572 31.9975 15.0872

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.1
Downloads last month
3
Safetensors
Model size
86.6M params
Tensor type
F32
·
Unable to determine this model’s pipeline type. Check the docs .

Finetuned from

Dataset used to train empty-michael/tinystories_1layer_attn_mlp_C10k_k100

Evaluation results