lapp0 commited on
Commit
fd69b9f
1 Parent(s): 367d521

End of training

Browse files
README.md CHANGED
@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 249.0
20
- - eval_frwikippl: 988.0
21
- - eval_zhwikippl: 222.0
22
- - eval_tinystoriesppl: 194.0
23
- - eval_loss: 1.4817
24
- - eval_runtime: 12.5664
25
- - eval_samples_per_second: 47.746
26
- - eval_steps_per_second: 11.937
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
@@ -49,7 +49,7 @@ The following hyperparameters were used during training:
49
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
50
  - train_embeddings: True
51
  - learning_rate: 1e-05
52
- - train_batch_size: 4
53
  - eval_batch_size: 4
54
  - seed: 42
55
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
@@ -64,17 +64,47 @@ Peak GPU Memory: 7.9381 GB
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
67
- | 0 | 0 | 1176821039104.0 | 72567767433216.0 | 20.1450 | 12.5803 | 47.694 | 11.923 | 3019898880.0 | 12713103196160.0 |
68
- | 1500 | 0.1010 | 21888.0 | 301056.0 | 4.5299 | 12.4618 | 48.147 | 12.037 | 8064.0 | 1056768.0 |
69
- | 3000 | 0.2020 | 1872.0 | 11648.0 | 2.7341 | 12.4652 | 48.134 | 12.034 | 1128.0 | 119296.0 |
70
- | 4500 | 0.3030 | 684.0 | 4480.0 | 2.1008 | 12.4698 | 48.116 | 12.029 | 468.0 | 4992.0 |
71
- | 6000 | 0.4040 | 404.0 | 2160.0 | 1.7866 | 12.4916 | 48.032 | 12.008 | 312.0 | 486.0 |
72
- | 7500 | 0.5051 | 300.0 | 1296.0 | 1.5620 | 12.5682 | 47.74 | 11.935 | 226.0 | 246.0 |
73
- | 9000 | 0.6061 | 249.0 | 988.0 | 1.4817 | 12.5664 | 47.746 | 11.937 | 194.0 | 222.0 |
74
- | 10500 | 0.7071 | 228.0 | 884.0 | 1.3820 | 12.5815 | 47.689 | 11.922 | 179.0 | 193.0 |
75
- | 12000 | 0.8081 | 220.0 | 892.0 | 1.3587 | 12.5682 | 47.74 | 11.935 | 177.0 | 170.0 |
76
- | 13500 | 0.9091 | 216.0 | 812.0 | 1.3531 | 12.5051 | 47.981 | 11.995 | 174.0 | 170.0 |
77
- | 14850 | 1.0 | 215.0 | 804.0 | 1.3510 | 12.5257 | 47.901 | 11.975 | 173.0 | 168.0 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
  ### Framework versions
80
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 2720.0
20
+ - eval_frwikippl: 32256.0
21
+ - eval_zhwikippl: 296960.0
22
+ - eval_tinystoriesppl: 1392.0
23
+ - eval_loss: 2.8924
24
+ - eval_runtime: 12.4707
25
+ - eval_samples_per_second: 48.113
26
+ - eval_steps_per_second: 12.028
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
 
49
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
50
  - train_embeddings: True
51
  - learning_rate: 1e-05
52
+ - train_batch_size: 1
53
  - eval_batch_size: 4
54
  - seed: 42
55
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
67
+ | 0 | 0 | 1821066133504.0 | 158329674399744.0 | 19.3254 | 12.5492 | 47.812 | 11.953 | 12079595520.0 | 98956046499840.0 |
68
+ | 1500 | 0.0253 | 46439333888.0 | 5153960755200.0 | 13.9821 | 12.5442 | 47.831 | 11.958 | 285212672.0 | 10445360463872.0 |
69
+ | 3000 | 0.0505 | 2179072.0 | 66060288.0 | 7.7394 | 12.573 | 47.721 | 11.93 | 158720.0 | 209715200.0 |
70
+ | 4500 | 0.0758 | 95744.0 | 2523136.0 | 5.2142 | 12.6045 | 47.602 | 11.901 | 17920.0 | 6029312.0 |
71
+ | 6000 | 0.1010 | 10816.0 | 158720.0 | 4.0370 | 12.5895 | 47.659 | 11.915 | 5760.0 | 671744.0 |
72
+ | 7500 | 0.1263 | 4448.0 | 55040.0 | 3.3192 | 12.5498 | 47.809 | 11.952 | 2720.0 | 296960.0 |
73
+ | 9000 | 0.1515 | 2720.0 | 32256.0 | 2.8924 | 12.4707 | 48.113 | 12.028 | 1392.0 | 296960.0 |
74
+ | 10500 | 0.1768 | 1960.0 | 20608.0 | 2.6753 | 12.5367 | 47.859 | 11.965 | 992.0 | 278528.0 |
75
+ | 12000 | 0.2020 | 864.0 | 4896.0 | 2.2104 | 12.4794 | 48.079 | 12.02 | 544.0 | 85504.0 |
76
+ | 13500 | 0.2273 | 564.0 | 4672.0 | 1.9660 | 12.4591 | 48.158 | 12.039 | 382.0 | 2304.0 |
77
+ | 15000 | 0.2525 | 452.0 | 2816.0 | 1.8089 | 12.5819 | 47.687 | 11.922 | 316.0 | 788.0 |
78
+ | 16500 | 0.2778 | 398.0 | 2160.0 | 1.7757 | 12.581 | 47.691 | 11.923 | 304.0 | 548.0 |
79
+ | 18000 | 0.3030 | 374.0 | 1944.0 | 1.6982 | 12.5631 | 47.759 | 11.94 | 296.0 | 478.0 |
80
+ | 19500 | 0.3283 | 358.0 | 1488.0 | 1.6521 | 12.6042 | 47.603 | 11.901 | 274.0 | 444.0 |
81
+ | 21000 | 0.3535 | 352.0 | 1544.0 | 1.6516 | 12.5472 | 47.819 | 11.955 | 268.0 | 466.0 |
82
+ | 22500 | 0.3788 | 336.0 | 1464.0 | 1.6172 | 12.5526 | 47.799 | 11.95 | 266.0 | 386.0 |
83
+ | 24000 | 0.4040 | 326.0 | 1280.0 | 1.5683 | 12.5056 | 47.979 | 11.995 | 242.0 | 248.0 |
84
+ | 25500 | 0.4293 | 298.0 | 1216.0 | 1.5292 | 12.5815 | 47.689 | 11.922 | 244.0 | 255.0 |
85
+ | 27000 | 0.4545 | 290.0 | 1072.0 | 1.4859 | 12.5923 | 47.648 | 11.912 | 236.0 | 236.0 |
86
+ | 28500 | 0.4798 | 276.0 | 1144.0 | 1.4542 | 12.5108 | 47.959 | 11.99 | 228.0 | 244.0 |
87
+ | 30000 | 0.5051 | 276.0 | 1200.0 | 1.4598 | 12.5421 | 47.839 | 11.96 | 204.0 | 258.0 |
88
+ | 31500 | 0.5303 | 270.0 | 1112.0 | 1.4433 | 12.5006 | 47.998 | 11.999 | 212.0 | 205.0 |
89
+ | 33000 | 0.5556 | 272.0 | 1040.0 | 1.4221 | 12.5626 | 47.761 | 11.94 | 209.0 | 236.0 |
90
+ | 34500 | 0.5808 | 252.0 | 1176.0 | 1.4007 | 12.5775 | 47.704 | 11.926 | 202.0 | 222.0 |
91
+ | 36000 | 0.6061 | 248.0 | 976.0 | 1.3998 | 12.5397 | 47.848 | 11.962 | 207.0 | 266.0 |
92
+ | 37500 | 0.6313 | 226.0 | 836.0 | 1.3400 | 12.6024 | 47.61 | 11.902 | 183.0 | 260.0 |
93
+ | 39000 | 0.6566 | 213.0 | 852.0 | 1.2991 | 12.6581 | 47.4 | 11.85 | 172.0 | 182.0 |
94
+ | 40500 | 0.6818 | 208.0 | 932.0 | 1.2862 | 12.5163 | 47.937 | 11.984 | 170.0 | 163.0 |
95
+ | 42000 | 0.7071 | 206.0 | 788.0 | 1.2804 | 12.6037 | 47.605 | 11.901 | 172.0 | 159.0 |
96
+ | 43500 | 0.7323 | 204.0 | 824.0 | 1.2747 | 12.5859 | 47.672 | 11.918 | 165.0 | 163.0 |
97
+ | 45000 | 0.7576 | 201.0 | 848.0 | 1.2704 | 12.722 | 47.162 | 11.791 | 165.0 | 153.0 |
98
+ | 46500 | 0.7828 | 203.0 | 760.0 | 1.2726 | 12.5879 | 47.665 | 11.916 | 169.0 | 156.0 |
99
+ | 48000 | 0.8081 | 205.0 | 820.0 | 1.2693 | 12.5698 | 47.734 | 11.933 | 170.0 | 165.0 |
100
+ | 49500 | 0.8333 | 199.0 | 792.0 | 1.2608 | 12.5756 | 47.712 | 11.928 | 166.0 | 165.0 |
101
+ | 51000 | 0.8586 | 198.0 | 768.0 | 1.2563 | 12.5984 | 47.625 | 11.906 | 167.0 | 160.0 |
102
+ | 52500 | 0.8838 | 197.0 | 788.0 | 1.2558 | 12.5705 | 47.731 | 11.933 | 164.0 | 159.0 |
103
+ | 54000 | 0.9091 | 197.0 | 776.0 | 1.2553 | 12.6019 | 47.612 | 11.903 | 166.0 | 166.0 |
104
+ | 55500 | 0.9343 | 197.0 | 784.0 | 1.2540 | 12.6329 | 47.495 | 11.874 | 165.0 | 163.0 |
105
+ | 57000 | 0.9596 | 197.0 | 776.0 | 1.2534 | 12.5525 | 47.799 | 11.95 | 165.0 | 161.0 |
106
+ | 58500 | 0.9848 | 196.0 | 780.0 | 1.2539 | 12.5854 | 47.674 | 11.919 | 165.0 | 161.0 |
107
+ | 59400 | 1.0 | 196.0 | 780.0 | 1.2536 | 12.5194 | 47.925 | 11.981 | 165.0 | 161.0 |
108
 
109
  ### Framework versions
110
  - Distily 0.2.0
logs/learning_rate=1e-05, per_device_train_batch_size=1, warmup_ratio=0.5/events.out.tfevents.1724136089.5f530b1cf724 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5de85bde75446badfb7e1612a874b6f824516310d7acfb4ea20f4fc35a6cbb0a
3
+ size 588