Edit model card

output_main

This model is a fine-tuned version of roneneldan/TinyStories-1Layer-21M on the roneneldan/TinyStories dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6604
  • Accuracy: 0.5791
  • Multicode K: 1
  • Dead Code Fraction/layer0: 0.1982
  • Mse/layer0: 6073.8637
  • Input Norm/layer0: 0.7182
  • Output Norm/layer0: 76.7891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 96
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss Accuracy Multicode K Dead Code Fraction/layer0 Mse/layer0 Input Norm/layer0 Output Norm/layer0
2.2319 0.1 1000 1.9134 0.5317 1 1.0 0.0 0.0 0.0
1.8521 0.21 2000 1.7990 0.5495 1 1.0 0.0 0.0 0.0
1.7879 0.31 3000 1.7739 0.5557 1 1.0 0.0 0.0 0.0
1.7728 0.42 4000 1.7666 0.5564 1 1.0 0.0 0.0 0.0
1.7686 0.52 5000 1.7609 0.5595 1 1.0 0.0 0.0 0.0
1.7635 0.63 6000 1.7555 0.5598 1 1.0 0.0 0.0 0.0
1.7523 0.73 7000 1.7383 0.5632 1 1.0 0.0 0.0 0.0
1.7471 0.83 8000 1.7368 0.5643 1 1.0 0.0 0.0 0.0
1.7404 0.94 9000 1.7277 0.5659 1 1.0 0.0 0.0 0.0
1.728 1.04 10000 1.7290 0.5647 1 1.0 0.0 0.0 0.0
1.7195 1.15 11000 1.7244 0.5667 1 1.0 0.0 0.0 0.0
1.7198 1.25 12000 1.7230 0.5671 1 1.0 0.0 0.0 0.0
1.7171 1.36 13000 1.7177 0.5689 1 1.0 0.0 0.0 0.0
1.7185 1.46 14000 1.7150 0.5688 1 1.0 0.0 0.0 0.0
1.7149 1.56 15000 1.7125 0.5695 1 1.0 0.0 0.0 0.0
1.7105 1.67 16000 1.7097 0.5695 1 1.0 0.0 0.0 0.0
1.7107 1.77 17000 1.7073 0.5689 1 1.0 0.0 0.0 0.0
1.7113 1.88 18000 1.7025 0.5712 1 1.0 0.0 0.0 0.0
1.7078 1.98 19000 1.7048 0.5702 1 1.0 0.0 0.0 0.0
1.693 2.09 20000 1.7045 0.5696 1 1.0 0.0 0.0 0.0
1.6935 2.19 21000 1.7068 0.5695 1 1.0 0.0 0.0 0.0
1.6962 2.29 22000 1.7046 0.5687 1 1.0 0.0 0.0 0.0
1.6954 2.4 23000 1.7019 0.5706 1 1.0 0.0 0.0 0.0
1.6933 2.5 24000 1.7002 0.5725 1 1.0 0.0 0.0 0.0
1.6942 2.61 25000 1.6983 0.5717 1 1.0 0.0 0.0 0.0
1.6935 2.71 26000 1.6938 0.5730 1 1.0 0.0 0.0 0.0
1.6928 2.82 27000 1.6978 0.5719 1 1.0 0.0 0.0 0.0
1.6927 2.92 28000 1.6935 0.5715 1 1.0 0.0 0.0 0.0
1.6855 3.02 29000 1.6978 0.5726 1 1.0 0.0 0.0 0.0
1.6773 3.13 30000 1.6951 0.5732 1 1.0 0.0 0.0 0.0
1.6788 3.23 31000 1.6926 0.5728 1 1.0 0.0 0.0 0.0
1.6813 3.34 32000 1.6920 0.5726 1 1.0 0.0 0.0 0.0
1.6782 3.44 33000 1.6926 0.5733 1 1.0 0.0 0.0 0.0
1.6801 3.55 34000 1.6894 0.5719 1 1.0 0.0 0.0 0.0
1.6796 3.65 35000 1.6890 0.5728 1 1.0 0.0 0.0 0.0
1.6768 3.75 36000 1.6882 0.5722 1 1.0 0.0 0.0 0.0
1.6802 3.86 37000 1.6872 0.5732 1 1.0 0.0 0.0 0.0
1.6809 3.96 38000 1.6855 0.5750 1 1.0 0.0 0.0 0.0
1.6701 4.07 39000 1.6886 0.5742 1 1.0 0.0 0.0 0.0
1.6646 4.17 40000 1.6890 0.5734 1 1.0 0.0 0.0 0.0
1.669 4.28 41000 1.6859 0.5747 1 1.0 0.0 0.0 0.0
1.6713 4.38 42000 1.6867 0.5740 1 1.0 0.0 0.0 0.0
1.6693 4.48 43000 1.6821 0.5750 1 1.0 0.0 0.0 0.0
1.6693 4.59 44000 1.6822 0.5747 1 1.0 0.0 0.0 0.0
1.6692 4.69 45000 1.6801 0.5745 1 1.0 0.0 0.0 0.0
1.6703 4.8 46000 1.6834 0.5761 1 1.0 0.0 0.0 0.0
1.6677 4.9 47000 1.6819 0.5756 1 1.0 0.0 0.0 0.0
1.6682 5.01 48000 1.6778 0.5752 1 1.0 0.0 0.0 0.0
1.6547 5.11 49000 1.6825 0.5751 1 1.0 0.0 0.0 0.0
1.6566 5.21 50000 1.6825 0.5758 1 1.0 0.0 0.0 0.0
1.6605 5.32 51000 1.6814 0.5746 1 1.0 0.0 0.0 0.0
1.6603 5.42 52000 1.6768 0.5755 1 1.0 0.0 0.0 0.0
1.6595 5.53 53000 1.6757 0.5753 1 1.0 0.0 0.0 0.0
1.6603 5.63 54000 1.6769 0.5738 1 1.0 0.0 0.0 0.0
1.662 5.74 55000 1.6758 0.5759 1 1.0 0.0 0.0 0.0
1.6602 5.84 56000 1.6771 0.5757 1 1.0 0.0 0.0 0.0
1.6624 5.94 57000 1.6749 0.5770 1 1.0 0.0 0.0 0.0
1.6527 6.05 58000 1.6791 0.5758 1 1.0 0.0 0.0 0.0
1.6474 6.15 59000 1.6763 0.5773 1 1.0 0.0 0.0 0.0
1.6494 6.26 60000 1.6765 0.5761 1 1.0 0.0 0.0 0.0
1.6539 6.36 61000 1.6741 0.5764 1 1.0 0.0 0.0 0.0
1.6539 6.47 62000 1.6752 0.5768 1 1.0 0.0 0.0 0.0
1.6529 6.57 63000 1.6737 0.5775 1 1.0 0.0 0.0 0.0
1.6533 6.67 64000 1.6725 0.5758 1 1.0 0.0 0.0 0.0
1.653 6.78 65000 1.6722 0.5774 1 1.0 0.0 0.0 0.0
1.6522 6.88 66000 1.6726 0.5762 1 1.0 0.0 0.0 0.0
1.6528 6.99 67000 1.6726 0.5768 1 1.0 0.0 0.0 0.0
1.6439 7.09 68000 1.6728 0.5771 1 1.0 0.0 0.0 0.0
1.6403 7.19 69000 1.6703 0.5758 1 1.0 0.0 0.0 0.0
1.6447 7.3 70000 1.6697 0.5772 1 1.0 0.0 0.0 0.0
1.6458 7.4 71000 1.6694 0.5777 1 1.0 0.0 0.0 0.0
1.6447 7.51 72000 1.6716 0.5771 1 1.0 0.0 0.0 0.0
1.6449 7.61 73000 1.6680 0.5779 1 1.0 0.0 0.0 0.0
1.6458 7.72 74000 1.6683 0.5779 1 1.0 0.0 0.0 0.0
1.6447 7.82 75000 1.6681 0.5778 1 1.0 0.0 0.0 0.0
1.6451 7.92 76000 1.6677 0.5781 1 1.0 0.0 0.0 0.0
1.6418 8.03 77000 1.6665 0.5789 1 1.0 0.0 0.0 0.0
1.6361 8.13 78000 1.6684 0.5779 1 1.0 0.0 0.0 0.0
1.636 8.24 79000 1.6687 0.5786 1 1.0 0.0 0.0 0.0
1.6357 8.34 80000 1.6670 0.5790 1 1.0 0.0 0.0 0.0
1.6379 8.45 81000 1.6658 0.5788 1 1.0 0.0 0.0 0.0
1.6405 8.55 82000 1.6661 0.5788 1 1.0 0.0 0.0 0.0
1.6378 8.65 83000 1.6650 0.5789 1 1.0 0.0 0.0 0.0
1.6386 8.76 84000 1.6650 0.5784 1 1.0 0.0 0.0 0.0
1.638 8.86 85000 1.6644 0.5785 1 1.0 0.0 0.0 0.0
1.6374 8.97 86000 1.6635 0.5777 1 1.0 0.0 0.0 0.0
1.6298 9.07 87000 1.6647 0.5785 1 1.0 0.0 0.0 0.0
1.6302 9.18 88000 1.6649 0.5787 1 1.0 0.0 0.0 0.0
1.6315 9.28 89000 1.6651 0.5782 1 1.0 0.0 0.0 0.0
1.631 9.38 90000 1.6636 0.5788 1 1.0 0.0 0.0 0.0
1.6316 9.49 91000 1.6627 0.5782 1 1.0 0.0 0.0 0.0
1.6286 9.59 92000 1.6646 0.5783 1 1.0 0.0 0.0 0.0
1.6304 9.7 93000 1.6632 0.5801 1 1.0 0.0 0.0 0.0
1.6298 9.8 94000 1.6623 0.5800 1 1.0 0.0 0.0 0.0
1.6309 9.91 95000 1.6620 0.5800 1 1.0 0.0 0.0 0.0
1.6302 10.01 96000 1.6602 0.5801 1 1.0 0.0 0.0 0.0
1.6242 10.11 97000 1.6610 0.5786 1 1.0 0.0 0.0 0.0
1.6258 10.22 98000 1.6605 0.5795 1 1.0 0.0 0.0 0.0
1.6234 10.32 99000 1.6605 0.5791 1 1.0 0.0 0.0 0.0
1.6245 10.43 100000 1.6604 0.5791 1 1.0 0.0 0.0 0.0

Framework versions

  • Transformers 4.29.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
19
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train taufeeque/TinyStories-1Layer-21M-Codebook

Evaluation results