End of training

Browse files

Files changed (2) hide show

README.md +30 -30
logs/copy_teacher_modules=_(_lm_head___False)_, learning_rate=4e-05/events.out.tfevents.1724055248.f383272e719b +3 -0

README.md CHANGED Viewed

@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 452.9807
-- eval_frwikippl: 741.6703
-- eval_zhwikippl: 169.7969
-- eval_tinystoriesppl: 694.5760
-- eval_loss: 1.2502
-- eval_runtime: 21.1964
-- eval_samples_per_second: 47.178
-- eval_steps_per_second: 11.794
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -48,7 +48,7 @@ More information needed
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
 - train_embeddings: True
-- learning_rate: 1e-05
 - train_batch_size: 1
 - eval_batch_size: 4
 - seed: 42
@@ -63,27 +63,27 @@ Peak GPU Memory: 3.9285 GB
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 270.2348 | 76.8142 |  |  |  |  | 671.1238 | 22.8030 |
-| 0 | 0 | 120078.375 | 1867851235328.0 | 18.7920 | 21.1643 | 47.249 | 11.812 | 72.8770 | 4013754155008.0 |
-| 5000 | 0.0505 | 399.8896 | 1364.9200 | 1.5750 | 21.223 | 47.119 | 11.78 | 430.3431 | 486.9932 |
-| 10000 | 0.1010 | 366.0540 | 968.1008 | 1.4975 | 21.2542 | 47.05 | 11.762 | 410.0440 | 300.9413 |
-| 15000 | 0.1515 | 382.6534 | 990.8644 | 1.4377 | 21.1883 | 47.196 | 11.799 | 455.3961 | 243.5069 |
-| 20000 | 0.2020 | 372.0864 | 985.5745 | 1.4590 | 21.2537 | 47.051 | 11.763 | 430.2186 | 317.8063 |
-| 25000 | 0.2525 | 459.8662 | 802.9102 | 1.3109 | 21.2174 | 47.131 | 11.783 | 674.2657 | 183.8540 |
-| 30000 | 0.3030 | 452.4371 | 822.7448 | 1.2777 | 21.2291 | 47.105 | 11.776 | 674.3492 | 162.7067 |
-| 35000 | 0.3535 | 476.7241 | 805.2602 | 1.2741 | 21.2169 | 47.132 | 11.783 | 736.0758 | 174.6150 |
-| 40000 | 0.4040 | 453.2438 | 770.2305 | 1.2733 | 21.1947 | 47.181 | 11.795 | 679.9471 | 163.0870 |
-| 45000 | 0.4545 | 460.5169 | 781.4591 | 1.2687 | 21.2116 | 47.144 | 11.786 | 700.2546 | 183.2052 |
-| 50000 | 0.5051 | 479.0564 | 794.0530 | 1.2632 | 21.229 | 47.105 | 11.776 | 743.4755 | 181.4419 |
-| 55000 | 0.5556 | 471.3993 | 748.4656 | 1.2630 | 21.215 | 47.137 | 11.784 | 731.375 | 172.6117 |
-| 60000 | 0.6061 | 446.4142 | 775.7834 | 1.2687 | 21.1528 | 47.275 | 11.819 | 669.1851 | 164.7928 |
-| 65000 | 0.6566 | 455.8672 | 744.0773 | 1.2538 | 21.2207 | 47.124 | 11.781 | 698.6068 | 164.4469 |
-| 70000 | 0.7071 | 453.5074 | 740.2094 | 1.2513 | 21.3501 | 46.838 | 11.71 | 697.8277 | 168.6457 |
-| 75000 | 0.7576 | 450.4874 | 723.2042 | 1.2535 | 21.2028 | 47.164 | 11.791 | 685.8463 | 167.9272 |
-| 80000 | 0.8081 | 455.6377 | 745.9662 | 1.2523 | 21.2178 | 47.13 | 11.783 | 701.7324 | 170.4892 |
-| 85000 | 0.8586 | 447.3922 | 746.4918 | 1.2509 | 21.2165 | 47.133 | 11.783 | 681.8325 | 168.7976 |
-| 90000 | 0.9091 | 453.0859 | 740.9397 | 1.2505 | 21.1987 | 47.173 | 11.793 | 696.0992 | 169.7290 |
-| 95000 | 0.9596 | 451.3083 | 741.0439 | 1.2504 | 21.5668 | 46.368 | 11.592 | 690.2544 | 169.7969 |
-| 99000 | 1.0 | 452.9807 | 741.6703 | 1.2502 | 21.1964 | 47.178 | 11.794 | 694.5760 | 169.7969 |
 ### Framework versions
 - Distily 0.2.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 401.6902
+- eval_frwikippl: 385.9396
+- eval_zhwikippl: 137.9653
+- eval_tinystoriesppl: 881.4292
+- eval_loss: 0.7112
+- eval_runtime: 21.2483
+- eval_samples_per_second: 47.063
+- eval_steps_per_second: 11.766
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
 - train_embeddings: True
+- learning_rate: 4e-05
 - train_batch_size: 1
 - eval_batch_size: 4
 - seed: 42
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 270.2348 | 76.8142 |  |  |  |  | 671.1238 | 22.8030 |
+| 0 | 0 | 120078.375 | 1867851235328.0 | 18.7920 | 21.2125 | 47.142 | 11.786 | 72.8770 | 4013754155008.0 |
+| 5000 | 0.0505 | 621.5149 | 991.7020 | 1.3528 | 21.2177 | 47.13 | 11.783 | 980.0922 | 399.9691 |
+| 10000 | 0.1010 | 574.4407 | 664.8521 | 1.1590 | 21.2225 | 47.12 | 11.78 | 1036.6780 | 493.8460 |
+| 15000 | 0.1515 | 543.0890 | 635.0353 | 1.0360 | 21.2351 | 47.092 | 11.773 | 1033.2988 | 145.9157 |
+| 20000 | 0.2020 | 509.8121 | 599.6746 | 0.9759 | 21.2099 | 47.148 | 11.787 | 985.1690 | 251.1274 |
+| 25000 | 0.2525 | 448.2854 | 486.9003 | 0.8334 | 21.2284 | 47.107 | 11.777 | 923.3450 | 171.9567 |
+| 30000 | 0.3030 | 420.2149 | 441.8981 | 0.7741 | 21.2742 | 47.005 | 11.751 | 893.9037 | 129.4944 |
+| 35000 | 0.3535 | 417.6187 | 442.7548 | 0.7695 | 21.5924 | 46.313 | 11.578 | 884.2755 | 140.6411 |
+| 40000 | 0.4040 | 419.8570 | 418.2776 | 0.7678 | 21.23 | 47.103 | 11.776 | 893.9774 | 162.6632 |
+| 45000 | 0.4545 | 420.1905 | 413.8966 | 0.7576 | 21.2355 | 47.091 | 11.773 | 905.9177 | 154.8089 |
+| 50000 | 0.5051 | 420.9561 | 426.7430 | 0.7544 | 21.2196 | 47.126 | 11.782 | 906.1800 | 147.5501 |
+| 55000 | 0.5556 | 417.3034 | 409.1867 | 0.7509 | 21.2021 | 47.165 | 11.791 | 902.3304 | 143.7327 |
+| 60000 | 0.6061 | 418.3230 | 413.0230 | 0.7525 | 21.2367 | 47.088 | 11.772 | 894.0145 | 156.6996 |
+| 65000 | 0.6566 | 404.0308 | 404.5305 | 0.7221 | 21.2003 | 47.169 | 11.792 | 878.4468 | 136.2006 |
+| 70000 | 0.7071 | 406.0154 | 392.1317 | 0.7194 | 21.2119 | 47.143 | 11.786 | 891.9106 | 137.0481 |
+| 75000 | 0.7576 | 400.8665 | 383.9604 | 0.7188 | 21.2118 | 47.144 | 11.786 | 871.7914 | 140.4630 |
+| 80000 | 0.8081 | 402.5625 | 387.4647 | 0.7168 | 21.2234 | 47.118 | 11.779 | 882.3771 | 141.0827 |
+| 85000 | 0.8586 | 399.3479 | 385.9124 | 0.7123 | 21.2047 | 47.159 | 11.79 | 875.1130 | 140.0700 |
+| 90000 | 0.9091 | 401.2549 | 386.7830 | 0.7117 | 21.2316 | 47.1 | 11.775 | 881.0649 | 138.5555 |
+| 95000 | 0.9596 | 401.4725 | 386.1842 | 0.7112 | 21.2217 | 47.122 | 11.78 | 880.2640 | 138.0389 |
+| 99000 | 1.0 | 401.6902 | 385.9396 | 0.7112 | 21.2483 | 47.063 | 11.766 | 881.4292 | 137.9653 |
 ### Framework versions
 - Distily 0.2.0

logs/copy_teacher_modules=_(_lm_head___False)_, learning_rate=4e-05/events.out.tfevents.1724055248.f383272e719b ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b0eafccae2b1f019f22949ebd8095fe6f6385a605f0c00baea1587be79ab771
+size 312