distily_bitnet_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 60.25
  • eval_frwikippl: 197.0
  • eval_zhwikippl: 105.5
  • eval_tinystoriesppl: 45.5
  • eval_loss: 0.3768
  • eval_runtime: 119.4949
  • eval_samples_per_second: 83.686
  • eval_steps_per_second: 10.461

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.2
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.7840 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.25 61.25 11.6875 19.125
0 0 1391569403904.0 152832116260864.0 21.0049 119.2908 83.829 10.479 2919235584.0 51951924412416.0
1000 0.0040 316.0 1464.0 1.3122 119.4245 83.735 10.467 264.0 278.0
2000 0.0081 229.0 800.0 1.1403 119.4259 83.734 10.467 195.0 182.0
3000 0.0121 185.0 608.0 1.0104 119.5923 83.617 10.452 169.0 169.0
4000 0.0162 159.0 520.0 0.9254 119.3483 83.788 10.474 133.0 188.0
5000 0.0202 137.0 424.0 0.8455 119.5965 83.615 10.452 115.0 174.0
6000 0.0242 122.5 450.0 0.7877 119.2948 83.826 10.478 98.5 210.0
7000 0.0283 111.0 416.0 0.7373 119.4795 83.696 10.462 91.0 153.0
8000 0.0323 101.5 404.0 0.6874 119.4413 83.723 10.465 81.0 176.0
9000 0.0364 91.0 378.0 0.6340 119.6875 83.551 10.444 78.5 140.0
10000 0.0404 85.5 342.0 0.5890 119.4852 83.692 10.462 71.5 123.5
11000 0.0444 80.0 318.0 0.5502 119.6263 83.594 10.449 66.0 157.0
12000 0.0485 75.5 284.0 0.5330 119.4136 83.743 10.468 62.0 117.5
13000 0.0525 72.0 247.0 0.5070 120.1015 83.263 10.408 62.75 125.5
14000 0.0566 72.0 240.0 0.4882 119.3335 83.799 10.475 58.0 127.0
15000 0.0606 73.0 221.0 0.4788 119.5948 83.616 10.452 57.5 197.0
16000 0.0646 69.5 230.0 0.4683 119.3756 83.769 10.471 57.0 137.0
17000 0.0687 69.0 237.0 0.4645 119.5417 83.653 10.457 55.75 73.0
18000 0.0727 68.5 217.0 0.4486 119.5671 83.635 10.454 51.5 98.5
19000 0.0768 70.0 231.0 0.4478 119.492 83.688 10.461 53.75 134.0
20000 0.0808 67.5 258.0 0.4461 119.4815 83.695 10.462 56.0 344.0
21000 0.0848 66.5 208.0 0.4358 119.4818 83.695 10.462 52.5 152.0
22000 0.0889 65.5 223.0 0.4335 119.4312 83.73 10.466 52.25 125.5
23000 0.0929 64.5 215.0 0.4274 119.7735 83.491 10.436 53.0 89.0
24000 0.0970 67.5 202.0 0.4228 119.3429 83.792 10.474 51.5 167.0
25000 0.1010 65.0 207.0 0.4228 119.4493 83.718 10.465 48.25 129.0
26000 0.1051 64.5 220.0 0.4226 119.5142 83.672 10.459 49.25 108.5
27000 0.1091 66.5 225.0 0.4135 119.5977 83.614 10.452 50.25 87.5
28000 0.1131 65.0 238.0 0.4187 119.4733 83.701 10.463 50.25 151.0
29000 0.1172 61.75 202.0 0.4126 119.6449 83.581 10.448 51.25 113.0
30000 0.1212 63.0 208.0 0.4097 119.4007 83.752 10.469 47.25 115.5
31000 0.1253 64.0 202.0 0.4078 119.804 83.47 10.434 50.25 165.0
32000 0.1293 64.0 203.0 0.4047 119.6394 83.584 10.448 48.75 142.0
33000 0.1333 61.0 227.0 0.4063 120.1224 83.248 10.406 48.75 80.0
34000 0.1374 61.5 201.0 0.4055 119.3472 83.789 10.474 47.25 144.0
35000 0.1414 65.0 205.0 0.4094 119.8662 83.426 10.428 48.75 137.0
36000 0.1455 66.0 205.0 0.4011 120.0687 83.286 10.411 48.75 95.5
37000 0.1495 65.0 202.0 0.4014 119.9716 83.353 10.419 49.25 132.0
38000 0.1535 64.0 213.0 0.4048 119.9711 83.353 10.419 47.0 193.0
39000 0.1576 61.25 224.0 0.4045 120.0075 83.328 10.416 46.75 134.0
40000 0.1616 62.25 196.0 0.3951 119.6212 83.597 10.45 46.0 113.5
41000 0.1657 64.5 220.0 0.4073 120.0286 83.313 10.414 47.25 149.0
42000 0.1697 62.5 201.0 0.3949 119.865 83.427 10.428 47.75 139.0
43000 0.1737 65.5 218.0 0.3990 119.7175 83.53 10.441 49.0 138.0
44000 0.1778 61.5 205.0 0.3963 119.7386 83.515 10.439 46.0 100.0
45000 0.1818 61.75 191.0 0.3996 120.0236 83.317 10.415 46.75 188.0
46000 0.1859 62.25 192.0 0.3951 119.5599 83.64 10.455 46.0 119.0
47000 0.1899 62.75 193.0 0.3968 119.5258 83.664 10.458 46.5 156.0
48000 0.1939 62.25 191.0 0.3894 119.9285 83.383 10.423 46.75 137.0
49000 0.1980 63.75 205.0 0.3900 119.9251 83.385 10.423 45.5 110.5
50000 0.2020 64.5 189.0 0.3950 119.6096 83.605 10.451 45.5 104.0
51000 0.2061 62.25 191.0 0.3918 119.568 83.634 10.454 45.75 103.5
52000 0.2101 63.75 186.0 0.3880 119.7111 83.534 10.442 46.5 115.0
53000 0.2141 62.0 199.0 0.3951 119.6976 83.544 10.443 48.75 123.0
54000 0.2182 61.5 189.0 0.3930 119.6308 83.591 10.449 44.5 191.0
55000 0.2222 60.5 196.0 0.3904 119.8999 83.403 10.425 48.25 148.0
56000 0.2263 61.25 223.0 0.3930 119.3067 83.818 10.477 46.5 132.0
57000 0.2303 61.25 196.0 0.3869 119.6246 83.595 10.449 46.5 145.0
58000 0.2343 61.75 196.0 0.3902 119.5228 83.666 10.458 44.25 163.0
59000 0.2384 61.25 195.0 0.3883 119.7394 83.515 10.439 45.75 131.0
60000 0.2424 62.25 198.0 0.3864 119.5742 83.63 10.454 47.0 159.0
61000 0.2465 60.5 192.0 0.3876 119.8064 83.468 10.434 47.0 113.0
62000 0.2505 63.0 208.0 0.3915 119.4153 83.741 10.468 46.0 152.0
63000 0.2545 61.5 189.0 0.3851 120.097 83.266 10.408 44.25 114.5
64000 0.2586 62.25 202.0 0.3850 119.5037 83.679 10.46 44.5 173.0
65000 0.2626 65.0 204.0 0.3852 119.7043 83.539 10.442 44.75 155.0
66000 0.2667 65.5 203.0 0.3834 119.7096 83.535 10.442 49.25 241.0
67000 0.2707 62.0 215.0 0.3827 119.9434 83.373 10.422 44.5 125.5
68000 0.2747 64.5 205.0 0.3874 119.7232 83.526 10.441 46.75 94.0
69000 0.2788 60.25 217.0 0.3852 119.6701 83.563 10.445 46.75 127.5
70000 0.2828 61.75 206.0 0.3816 119.6845 83.553 10.444 46.25 97.0
71000 0.2869 61.25 198.0 0.3802 119.5798 83.626 10.453 44.0 86.5
72000 0.2909 60.5 184.0 0.3809 119.6483 83.578 10.447 44.5 84.0
73000 0.2949 61.25 182.0 0.3843 119.8145 83.462 10.433 47.25 233.0
74000 0.2990 60.25 200.0 0.3829 119.6185 83.599 10.45 45.5 170.0
75000 0.3030 64.0 209.0 0.3849 119.964 83.358 10.42 47.0 121.5
76000 0.3071 62.5 185.0 0.3763 119.5942 83.616 10.452 45.75 182.0
77000 0.3111 62.0 198.0 0.3845 119.8226 83.457 10.432 45.75 133.0
78000 0.3152 61.75 195.0 0.3804 119.4829 83.694 10.462 46.5 145.0
79000 0.3192 62.75 204.0 0.3818 119.7536 83.505 10.438 46.25 75.5
80000 0.3232 63.75 208.0 0.3821 119.5149 83.672 10.459 48.5 221.0
81000 0.3273 63.75 197.0 0.3831 119.7777 83.488 10.436 48.0 244.0
82000 0.3313 62.0 206.0 0.3798 119.4957 83.685 10.461 45.25 84.5
83000 0.3354 61.5 207.0 0.3793 119.7781 83.488 10.436 46.25 111.5
84000 0.3394 62.25 212.0 0.3838 119.5784 83.627 10.453 47.25 111.0
85000 0.3434 65.0 200.0 0.3797 119.7835 83.484 10.435 45.75 155.0
86000 0.3475 63.0 200.0 0.3831 119.5894 83.619 10.452 44.5 138.0
87000 0.3515 61.0 193.0 0.3845 119.7299 83.521 10.44 45.75 131.0
88000 0.3556 61.25 209.0 0.3815 119.5304 83.661 10.458 44.25 175.0
89000 0.3596 61.0 180.0 0.3785 119.7064 83.538 10.442 46.25 129.0
90000 0.3636 62.25 214.0 0.3781 119.5581 83.641 10.455 44.25 84.5
91000 0.3677 60.0 179.0 0.3719 119.8694 83.424 10.428 42.75 109.5
92000 0.3717 63.25 194.0 0.3835 119.4686 83.704 10.463 46.25 130.0
93000 0.3758 62.75 190.0 0.3791 119.8026 83.471 10.434 46.5 147.0
94000 0.3798 61.5 246.0 0.3833 119.4626 83.708 10.464 46.75 106.0
95000 0.3838 62.0 210.0 0.3833 119.5775 83.628 10.453 45.25 141.0
96000 0.3879 58.75 207.0 0.3829 119.501 83.681 10.46 46.5 102.0
97000 0.3919 61.5 199.0 0.3825 119.9706 83.354 10.419 47.0 121.5
98000 0.3960 62.5 213.0 0.3791 119.7509 83.507 10.438 46.5 228.0
99000 0.4 60.25 197.0 0.3768 119.4949 83.686 10.461 45.5 105.5
100000 0.4040 61.75 218.0 0.3827 119.5924 83.617 10.452 45.25 302.0
101000 0.4081 61.25 202.0 0.3804 119.5839 83.623 10.453 46.75 128.0
102000 0.4121 62.5 191.0 0.3767 119.5413 83.653 10.457 45.5 140.0
103000 0.4162 60.75 192.0 0.3782 119.9597 83.361 10.42 46.25 88.5
104000 0.4202 63.25 208.0 0.3788 119.3862 83.762 10.47 43.75 123.0
105000 0.4242 60.25 177.0 0.3715 119.8313 83.451 10.431 43.5 133.0
106000 0.4283 64.0 204.0 0.3845 119.8117 83.464 10.433 43.0 98.5
107000 0.4323 60.25 192.0 0.3748 119.4944 83.686 10.461 45.75 140.0
108000 0.4364 62.25 206.0 0.3731 119.4119 83.744 10.468 42.5 96.5
109000 0.4404 60.75 217.0 0.3763 119.6322 83.59 10.449 44.25 129.0
110000 0.4444 62.75 204.0 0.3750 119.5341 83.658 10.457 47.5 102.0
111000 0.4485 60.25 187.0 0.3758 119.5253 83.664 10.458 44.5 92.5
112000 0.4525 61.5 191.0 0.3819 119.5932 83.617 10.452 46.75 118.0
113000 0.4566 62.0 220.0 0.3811 119.6275 83.593 10.449 45.5 151.0
114000 0.4606 62.5 204.0 0.3758 119.4174 83.74 10.467 45.25 132.0
115000 0.4646 63.5 210.0 0.3753 119.8784 83.418 10.427 45.75 118.5
116000 0.4687 62.5 189.0 0.3816 119.3948 83.756 10.469 42.75 68.5
117000 0.4727 60.0 200.0 0.3803 119.6297 83.591 10.449 47.25 122.5
118000 0.4768 62.75 216.0 0.3716 119.5222 83.666 10.458 45.0 221.0
119000 0.4808 61.25 203.0 0.3789 119.6407 83.584 10.448 45.5 138.0
120000 0.4848 61.0 204.0 0.3801 119.4345 83.728 10.466 44.0 192.0
121000 0.4889 60.25 193.0 0.3764 119.5113 83.674 10.459 46.75 110.0
122000 0.4929 62.0 194.0 0.3713 119.3661 83.776 10.472 44.75 90.0
123000 0.4970 62.25 196.0 0.3738 119.664 83.567 10.446 45.25 216.0
124000 0.5010 63.0 200.0 0.3766 119.3724 83.771 10.471 45.0 122.0
125000 0.5051 63.0 216.0 0.3802 119.8228 83.457 10.432 43.75 101.0
126000 0.5091 62.25 201.0 0.3797 119.3712 83.772 10.472 41.75 144.0
127000 0.5131 61.75 190.0 0.3692 119.8775 83.418 10.427 46.5 189.0
128000 0.5172 62.5 218.0 0.3726 119.4768 83.698 10.462 43.25 108.0
129000 0.5212 62.25 200.0 0.3772 119.4392 83.725 10.466 43.0 179.0
130000 0.5253 61.25 197.0 0.3756 119.5936 83.617 10.452 46.0 150.0
131000 0.5293 61.75 202.0 0.3773 119.7616 83.499 10.437 45.5 110.5
132000 0.5333 60.25 208.0 0.3755 119.6597 83.57 10.446 45.0 108.0
133000 0.5374 61.0 216.0 0.3811 119.8666 83.426 10.428 43.25 200.0
134000 0.5414 63.5 210.0 0.3782 119.4305 83.731 10.466 44.25 141.0
135000 0.5455 62.25 196.0 0.3761 119.5947 83.616 10.452 46.25 84.0
136000 0.5495 62.75 189.0 0.3822 119.566 83.636 10.454 45.75 229.0
137000 0.5535 60.75 206.0 0.3765 119.6759 83.559 10.445 45.0 166.0
138000 0.5576 60.75 198.0 0.3765 119.5901 83.619 10.452 46.0 286.0
139000 0.5616 63.0 200.0 0.3845 119.8562 83.433 10.429 45.75 304.0
140000 0.5657 61.0 198.0 0.3744 119.5 83.682 10.46 44.75 98.0
141000 0.5697 63.75 193.0 0.3753 119.5173 83.67 10.459 47.0 106.0
142000 0.5737 60.5 201.0 0.3719 119.641 83.583 10.448 45.0 193.0
143000 0.5778 60.75 185.0 0.3779 120.0007 83.333 10.417 45.5 135.0
144000 0.5818 62.25 204.0 0.3785 119.4551 83.713 10.464 45.75 200.0
145000 0.5859 61.5 219.0 0.3786 119.7355 83.517 10.44 44.75 106.5
146000 0.5899 64.5 194.0 0.3764 119.4362 83.727 10.466 46.75 151.0
147000 0.5939 62.0 183.0 0.3692 119.7765 83.489 10.436 45.5 126.0
148000 0.5980 61.5 197.0 0.3721 119.484 83.693 10.462 46.0 89.5
149000 0.6020 61.75 212.0 0.3737 119.5109 83.674 10.459 44.25 198.0
150000 0.6061 58.25 193.0 0.3798 119.4511 83.716 10.465 45.75 254.0
151000 0.6101 61.25 190.0 0.3736 119.92 83.389 10.424 45.0 668.0
152000 0.6141 63.75 194.0 0.3777 119.451 83.716 10.465 43.0 92.5
153000 0.6182 63.25 202.0 0.3738 119.5904 83.619 10.452 45.25 148.0
154000 0.6222 62.0 189.0 0.3791 119.3905 83.759 10.47 47.75 156.0
155000 0.6263 61.0 196.0 0.3802 119.8694 83.424 10.428 45.5 278.0
156000 0.6303 62.5 193.0 0.3784 119.4204 83.738 10.467 43.75 151.0
157000 0.6343 61.75 196.0 0.3697 119.594 83.616 10.452 46.25 107.5
158000 0.6384 60.75 187.0 0.3742 119.3841 83.763 10.47 46.0 138.0
159000 0.6424 60.5 190.0 0.3720 119.9495 83.368 10.421 47.25 191.0
160000 0.6465 61.25 214.0 0.3804 119.5882 83.62 10.453 46.25 95.5
161000 0.6505 60.0 184.0 0.3740 119.571 83.632 10.454 44.5 141.0
162000 0.6545 60.5 209.0 0.3771 119.2965 83.825 10.478 45.75 108.5
163000 0.6586 59.5 202.0 0.3740 119.4559 83.713 10.464 43.5 164.0
164000 0.6626 62.0 188.0 0.3712 119.4787 83.697 10.462 44.75 124.0
165000 0.6667 59.25 198.0 0.3712 119.5568 83.642 10.455 46.0 220.0
166000 0.6707 60.0 186.0 0.3722 119.4093 83.746 10.468 46.0 91.0
167000 0.6747 60.75 197.0 0.3722 119.7627 83.498 10.437 43.5 118.5
168000 0.6788 63.0 213.0 0.3794 119.5874 83.621 10.453 45.75 84.0
169000 0.6828 62.0 199.0 0.3741 119.5162 83.671 10.459 45.25 114.5
170000 0.6869 64.0 210.0 0.3780 119.2009 83.892 10.487 44.5 105.0
171000 0.6909 60.75 194.0 0.3709 119.3434 83.792 10.474 46.25 114.0
172000 0.6949 64.5 201.0 0.3779 119.4813 83.695 10.462 45.75 162.0
173000 0.6990 60.75 189.0 0.3747 119.766 83.496 10.437 46.0 76.5
174000 0.7030 62.25 193.0 0.3770 119.3894 83.76 10.47 47.0 125.5
175000 0.7071 61.0 179.0 0.3775 119.6129 83.603 10.45 44.5 169.0
176000 0.7111 62.5 197.0 0.3756 119.2958 83.825 10.478 45.0 151.0
177000 0.7152 60.75 206.0 0.3787 119.3375 83.796 10.474 43.5 278.0
178000 0.7192 60.5 194.0 0.3764 119.093 83.968 10.496 45.0 106.5
179000 0.7232 62.0 195.0 0.3721 119.4974 83.684 10.46 44.5 156.0
180000 0.7273 60.5 197.0 0.3740 119.4195 83.738 10.467 44.5 128.0
181000 0.7313 62.25 191.0 0.3702 119.5899 83.619 10.452 44.25 84.0
182000 0.7354 60.5 187.0 0.3699 119.2683 83.845 10.481 42.5 99.5
183000 0.7394 60.0 186.0 0.3738 119.3468 83.789 10.474 43.25 134.0
184000 0.7434 63.0 197.0 0.3732 119.3936 83.757 10.47 47.5 124.0
185000 0.7475 66.5 197.0 0.3786 119.349 83.788 10.473 42.5 198.0
186000 0.7515 62.0 219.0 0.3716 119.309 83.816 10.477 44.75 155.0
187000 0.7556 60.0 189.0 0.3706 119.6934 83.547 10.443 45.5 100.0
188000 0.7596 62.0 204.0 0.3757 119.2139 83.883 10.485 43.75 114.5
189000 0.7636 61.0 199.0 0.3743 119.6216 83.597 10.45 43.75 113.5
190000 0.7677 61.0 198.0 0.3779 119.1564 83.923 10.49 47.0 82.5
191000 0.7717 63.0 202.0 0.3759 119.4008 83.752 10.469 44.5 141.0
192000 0.7758 59.0 179.0 0.3766 119.3454 83.79 10.474 46.25 97.5
193000 0.7798 59.0 212.0 0.3760 119.4054 83.748 10.469 45.5 154.0
194000 0.7838 62.25 206.0 0.3746 119.3162 83.811 10.476 46.25 75.0
195000 0.7879 62.0 199.0 0.3790 119.7096 83.535 10.442 44.25 116.0
196000 0.7919 59.75 210.0 0.3745 119.2919 83.828 10.479 45.75 82.0
197000 0.7960 62.75 219.0 0.3766 119.5628 83.638 10.455 48.0 94.5
198000 0.8 62.0 185.0 0.3719 119.2608 83.85 10.481 43.5 103.0
199000 0.8040 62.0 190.0 0.3757 119.3233 83.806 10.476 43.0 115.0
200000 0.8081 60.75 199.0 0.3724 119.2813 83.835 10.479 44.75 163.0
201000 0.8121 62.0 185.0 0.3692 119.3617 83.779 10.472 45.25 101.0
202000 0.8162 63.0 184.0 0.3722 119.2684 83.844 10.481 45.75 169.0
203000 0.8202 61.0 229.0 0.3741 119.4577 83.712 10.464 46.5 225.0
204000 0.8242 62.25 180.0 0.3733 119.3774 83.768 10.471 46.75 129.0
205000 0.8283 59.75 181.0 0.3753 119.5249 83.665 10.458 44.25 119.5
206000 0.8323 61.75 194.0 0.3771 119.1299 83.942 10.493 44.75 140.0
207000 0.8364 61.25 200.0 0.3764 119.4217 83.737 10.467 42.5 120.0
208000 0.8404 60.25 188.0 0.3732 119.5447 83.651 10.456 43.0 212.0
209000 0.8444 61.0 189.0 0.3762 119.6224 83.596 10.45 45.0 92.0
210000 0.8485 59.5 193.0 0.3753 119.6028 83.61 10.451 43.25 208.0
211000 0.8525 59.0 197.0 0.3727 119.2025 83.891 10.486 44.75 138.0
212000 0.8566 61.5 196.0 0.3717 119.3458 83.79 10.474 42.5 358.0
213000 0.8606 62.25 187.0 0.3720 119.2634 83.848 10.481 43.5 151.0
214000 0.8646 62.0 190.0 0.3721 119.3182 83.81 10.476 43.0 102.5
215000 0.8687 60.75 205.0 0.3708 119.2675 83.845 10.481 45.25 90.5
216000 0.8727 60.5 188.0 0.3713 119.2927 83.827 10.478 43.5 88.0
217000 0.8768 60.25 190.0 0.3701 119.2572 83.852 10.482 44.25 100.5
218000 0.8808 61.0 197.0 0.3699 119.3148 83.812 10.476 42.75 106.5
219000 0.8848 61.0 190.0 0.3689 119.1907 83.899 10.487 44.0 173.0
220000 0.8889 62.75 203.0 0.3786 119.4453 83.72 10.465 43.25 155.0
221000 0.8929 62.25 189.0 0.3740 119.3127 83.813 10.477 44.25 130.0
222000 0.8970 63.5 183.0 0.3730 119.2006 83.892 10.487 46.75 84.5
223000 0.9010 61.25 210.0 0.3728 119.5812 83.625 10.453 46.0 402.0
224000 0.9051 63.25 196.0 0.3738 119.3814 83.765 10.471 45.0 164.0
225000 0.9091 62.5 188.0 0.3719 119.3165 83.811 10.476 47.0 73.5
226000 0.9131 61.5 197.0 0.3751 119.2818 83.835 10.479 44.25 58.75
227000 0.9172 60.0 198.0 0.3691 119.3438 83.792 10.474 46.25 61.5
228000 0.9212 61.25 197.0 0.3718 119.576 83.629 10.454 46.0 108.5
229000 0.9253 61.75 199.0 0.3739 119.3345 83.798 10.475 45.5 196.0
230000 0.9293 62.0 202.0 0.3770 119.1355 83.938 10.492 45.0 145.0
231000 0.9333 60.25 198.0 0.3712 119.2532 83.855 10.482 43.25 134.0
232000 0.9374 61.5 221.0 0.3749 119.6348 83.588 10.448 44.5 120.5
233000 0.9414 60.5 203.0 0.3714 119.2199 83.879 10.485 45.0 133.0
234000 0.9455 61.5 198.0 0.3707 119.3045 83.819 10.477 45.25 87.5
235000 0.9495 63.25 203.0 0.3739 119.2723 83.842 10.48 44.75 120.5
236000 0.9535 60.0 200.0 0.3713 119.3693 83.774 10.472 43.5 197.0
237000 0.9576 62.5 199.0 0.3757 119.3074 83.817 10.477 44.5 151.0
238000 0.9616 64.5 216.0 0.3755 119.2789 83.837 10.48 43.25 135.0
239000 0.9657 64.0 188.0 0.3735 119.5725 83.631 10.454 45.0 126.0
240000 0.9697 61.25 185.0 0.3742 119.252 83.856 10.482 43.5 147.0
241000 0.9737 60.5 193.0 0.3743 119.2507 83.857 10.482 45.0 100.5
242000 0.9778 61.75 194.0 0.3752 119.2384 83.866 10.483 44.25 105.0
243000 0.9818 63.0 230.0 0.3746 119.6142 83.602 10.45 44.75 138.0
244000 0.9859 62.5 204.0 0.3706 119.4435 83.722 10.465 44.5 143.0
245000 0.9899 61.75 202.0 0.3716 119.2642 83.847 10.481 42.75 117.5
246000 0.9939 62.0 208.0 0.3711 119.3449 83.791 10.474 43.0 141.0
247000 0.9980 61.5 204.0 0.3750 119.4143 83.742 10.468 46.0 90.0
247500 1.0 62.5 214.0 0.3691 119.5852 83.622 10.453 46.0 121.5

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
15
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for distily/distily_bitnet_gpt2

Finetuned
(1268)
this model