End of training

Browse files

Files changed (2) hide show

README.md +34 -34
logs/distillation_objective=MultiObjective(logits_weight_1__logits_loss_fn_(fn_kl_divergence_loss())__activations_weight_0.2__activations_loss_fn_(fn_mse_loss())__attentions_weight_0__attentions_loss_fn_(f/events.out.tfevents.1723462234.93d6cbb3ad53 +3 -0

README.md CHANGED Viewed

@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 433.0859
-- eval_frwikippl: 2823.5620
-- eval_zhwikippl: 4932.8379
-- eval_loss: 21.1035
-- eval_runtime: 34.4485
-- eval_samples_per_second: 58.058
-- eval_steps_per_second: 7.257
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -45,7 +45,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0.1, activations_loss_fn=(fn:mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:mse_loss()))
 - train_embeddings: True
 - learning_rate: 4e-05
 - train_batch_size: 8
@@ -62,32 +62,32 @@ Peak GPU Memory: 8.0893 GB
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 30.2086 | 57.2728 |  |  |  |  | 18.1784 |
-| 0 | 0 | 54069.2930 | 57285.3438 | 69.6280 | 34.3114 | 58.29 | 7.286 | 54227.1016 |
-| 1000 | 0.0404 | 1149.4497 | 6758.9292 | 22.9270 | 34.3626 | 58.203 | 7.275 | 55191.4258 |
-| 2000 | 0.0808 | 848.3209 | 5094.3662 | 22.2020 | 34.3795 | 58.174 | 7.272 | 14284.0166 |
-| 3000 | 0.1212 | 700.4797 | 4480.8540 | 21.8288 | 34.371 | 58.189 | 7.274 | 7045.9990 |
-| 4000 | 0.1616 | 615.9059 | 3635.8176 | 21.5565 | 34.4355 | 58.08 | 7.26 | 3316.0488 |
-| 5000 | 0.2020 | 556.0313 | 3492.5959 | 21.4455 | 34.3262 | 58.265 | 7.283 | 4788.7505 |
-| 6000 | 0.2424 | 528.5394 | 3328.1577 | 21.2810 | 34.3681 | 58.193 | 7.274 | 3058.2744 |
-| 7000 | 0.2828 | 479.2375 | 2988.6665 | 21.2197 | 34.3863 | 58.163 | 7.27 | 3689.9192 |
-| 8000 | 0.3232 | 448.9053 | 2847.9541 | 21.0785 | 34.5149 | 57.946 | 7.243 | 1743.5521 |
-| 9000 | 0.3636 | 433.0859 | 2823.5620 | 21.1035 | 34.4485 | 58.058 | 7.257 | 4932.8379 |
-| 10000 | 0.4040 | 423.8369 | 2843.9414 | 21.0105 | 34.4298 | 58.089 | 7.261 | 3959.4795 |
-| 11000 | 0.4444 | 394.3074 | 2524.8374 | 20.9575 | 34.5178 | 57.941 | 7.243 | 6243.0879 |
-| 12000 | 0.4848 | 385.4673 | 2595.5920 | 20.9185 | 34.4535 | 58.049 | 7.256 | 17321.8613 |
-| 13000 | 0.5253 | 369.9537 | 2477.9255 | 20.8475 | 34.4953 | 57.979 | 7.247 | 2443.6860 |
-| 14000 | 0.5657 | 358.8618 | 2519.8567 | 20.7897 | 34.9016 | 57.304 | 7.163 | 3639.9983 |
-| 15000 | 0.6061 | 343.0577 | 2395.4692 | 20.7710 | 34.3143 | 58.285 | 7.286 | 1816.2738 |
-| 16000 | 0.6465 | 343.8312 | 2195.5515 | 20.7428 | 34.184 | 58.507 | 7.313 | 14709.8760 |
-| 17000 | 0.6869 | 336.7496 | 2234.2798 | 20.7590 | 34.4691 | 58.023 | 7.253 | 6489.5991 |
-| 18000 | 0.7273 | 338.3747 | 2191.5310 | 20.6583 | 34.4634 | 58.033 | 7.254 | 2819.0298 |
-| 19000 | 0.7677 | 324.3280 | 2071.9238 | 20.6345 | 34.4307 | 58.088 | 7.261 | 3877.8486 |
-| 20000 | 0.8081 | 315.1911 | 2056.7864 | 20.5710 | 34.2186 | 58.448 | 7.306 | 3151.9771 |
-| 21000 | 0.8485 | 315.4604 | 2161.1489 | 20.5432 | 34.5086 | 57.957 | 7.245 | 3105.1853 |
-| 22000 | 0.8889 | 324.6304 | 1950.2999 | 20.6125 | 34.2565 | 58.383 | 7.298 | 2055.8921 |
-| 23000 | 0.9293 | 313.9452 | 1958.0153 | 20.5900 | 34.5413 | 57.902 | 7.238 | 4405.8896 |
-| 24000 | 0.9697 | 311.3475 | 1918.9283 | 20.5405 | 34.2718 | 58.357 | 7.295 | 11800.9756 |
-| 24750 | 1.0 | 303.2348 | 1956.3597 | 20.4700 | 34.3296 | 58.259 | 7.282 | 15104.0020 |
 ### Framework versions
 - Distily 0.2.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 495.9718
+- eval_frwikippl: 3345.0957
+- eval_zhwikippl: 2696.0598
+- eval_loss: 40.5622
+- eval_runtime: 34.3051
+- eval_samples_per_second: 58.3
+- eval_steps_per_second: 7.288
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- distillation_objective: MultiObjective(logits_weight=1, logits_loss_fn=(fn:kl_divergence_loss()), activations_weight=0.2, activations_loss_fn=(fn:mse_loss()), attentions_weight=0, attentions_loss_fn=(fn:mse_loss()))
 - train_embeddings: True
 - learning_rate: 4e-05
 - train_batch_size: 8
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 30.2086 | 57.2728 |  |  |  |  | 18.1784 |
+| 0 | 0 | 54069.2930 | 57285.3438 | 133.3560 | 34.3024 | 58.305 | 7.288 | 54227.1016 |
+| 1000 | 0.0404 | 1437.1931 | 8480.3613 | 43.3980 | 34.5438 | 57.898 | 7.237 | 74395.6406 |
+| 2000 | 0.0808 | 997.8790 | 5260.8130 | 42.3937 | 34.3703 | 58.19 | 7.274 | 33120.7461 |
+| 3000 | 0.1212 | 830.5260 | 5152.1616 | 41.7965 | 34.348 | 58.228 | 7.278 | 11334.5342 |
+| 4000 | 0.1616 | 745.0864 | 4422.4756 | 41.2595 | 34.4519 | 58.052 | 7.256 | 5651.1323 |
+| 5000 | 0.2020 | 644.0798 | 4158.1821 | 41.1632 | 34.4631 | 58.033 | 7.254 | 4903.9395 |
+| 6000 | 0.2424 | 592.7726 | 3791.3215 | 40.8778 | 34.3097 | 58.293 | 7.287 | 4353.2559 |
+| 7000 | 0.2828 | 545.3409 | 3490.1353 | 40.8020 | 34.4207 | 58.105 | 7.263 | 3123.4839 |
+| 8000 | 0.3232 | 519.2236 | 3238.8032 | 40.6310 | 34.2625 | 58.373 | 7.297 | 1952.6049 |
+| 9000 | 0.3636 | 495.9718 | 3345.0957 | 40.5622 | 34.3051 | 58.3 | 7.288 | 2696.0598 |
+| 10000 | 0.4040 | 482.7110 | 3048.2520 | 40.4688 | 34.3828 | 58.169 | 7.271 | 2027.5375 |
+| 11000 | 0.4444 | 453.9180 | 2860.8340 | 40.3758 | 34.2822 | 58.339 | 7.292 | 2861.5081 |
+| 12000 | 0.4848 | 441.7129 | 2985.2966 | 40.2887 | 34.2175 | 58.45 | 7.306 | 2510.5007 |
+| 13000 | 0.5253 | 429.0357 | 2882.7014 | 40.1765 | 34.4175 | 58.11 | 7.264 | 6012.3589 |
+| 14000 | 0.5657 | 416.6578 | 2756.2913 | 40.1022 | 34.4762 | 58.011 | 7.251 | 12478.4199 |
+| 15000 | 0.6061 | 406.1163 | 2797.8003 | 40.0135 | 34.5042 | 57.964 | 7.246 | 6068.8252 |
+| 16000 | 0.6465 | 405.6435 | 2525.5491 | 39.9328 | 34.3124 | 58.288 | 7.286 | 4309.2979 |
+| 17000 | 0.6869 | 394.7977 | 2709.6606 | 39.9735 | 34.3165 | 58.281 | 7.285 | 2797.2800 |
+| 18000 | 0.7273 | 397.4739 | 2544.8535 | 39.7368 | 34.4016 | 58.137 | 7.267 | 9888.5605 |
+| 19000 | 0.7677 | 387.6284 | 2540.5505 | 39.7513 | 34.3493 | 58.225 | 7.278 | 5071.7769 |
+| 20000 | 0.8081 | 378.9675 | 2503.9182 | 39.6105 | 34.4198 | 58.106 | 7.263 | 3492.3926 |
+| 21000 | 0.8485 | 376.9130 | 2442.8845 | 39.5590 | 34.343 | 58.236 | 7.28 | 10077.8555 |
+| 22000 | 0.8889 | 374.1136 | 2348.3101 | 39.5182 | 34.2953 | 58.317 | 7.29 | 3595.5537 |
+| 23000 | 0.9293 | 368.7203 | 2389.3955 | 39.4282 | 34.6197 | 57.771 | 7.221 | 11663.1113 |
+| 24000 | 0.9697 | 365.7831 | 2363.9253 | 39.4065 | 34.6468 | 57.725 | 7.216 | 5269.2183 |
+| 24750 | 1.0 | 363.6872 | 2441.5068 | 39.3040 | 34.7181 | 57.607 | 7.201 | 2566.7729 |
 ### Framework versions
 - Distily 0.2.0

logs/distillation_objective=MultiObjective(logits_weight_1__logits_loss_fn_(fn_kl_divergence_loss())__activations_weight_0.2__activations_loss_fn_(fn_mse_loss())__attentions_weight_0__attentions_loss_fn_(f/events.out.tfevents.1723462234.93d6cbb3ad53 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff7c6cc0f891df0ef3e5fa3d535c16f1909d49df2ba51ee53ddeeaf9ca29aa5e
+size 253