End of training
Browse files
README.md
CHANGED
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
- eval_enwikippl:
|
20 |
-
- eval_frwikippl:
|
21 |
-
- eval_zhwikippl:
|
22 |
-
- eval_loss:
|
23 |
-
- eval_runtime: 17.
|
24 |
-
- eval_samples_per_second: 57.
|
25 |
-
- eval_steps_per_second: 7.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -45,7 +45,7 @@ More information needed
|
|
45 |
### Training hyperparameters
|
46 |
|
47 |
The following hyperparameters were used during training:
|
48 |
-
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=
|
49 |
- train_embeddings: True
|
50 |
- learning_rate: 4e-05
|
51 |
- train_batch_size: 8
|
@@ -62,20 +62,20 @@ Peak GPU Memory: 8.2195 GB
|
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
|
65 |
-
| 0 | 0 | 55429.6875 | 57698.8047 |
|
66 |
-
| 1000 | 0.0808 |
|
67 |
-
| 2000 | 0.1616 |
|
68 |
-
| 3000 | 0.2424 |
|
69 |
-
| 4000 | 0.3232 |
|
70 |
-
| 5000 | 0.4040 |
|
71 |
-
| 6000 | 0.4848 |
|
72 |
-
| 7000 | 0.5657 |
|
73 |
-
| 8000 | 0.6465 |
|
74 |
-
| 9000 | 0.7273 |
|
75 |
-
| 10000 | 0.8081 |
|
76 |
-
| 11000 | 0.8889 |
|
77 |
-
| 12000 | 0.9697 |
|
78 |
-
| 12375 | 1.0 |
|
79 |
|
80 |
### Framework versions
|
81 |
- Distily 0.2.0
|
|
|
16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
17 |
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- eval_enwikippl: 208.9635
|
20 |
+
- eval_frwikippl: 1351.4938
|
21 |
+
- eval_zhwikippl: 781.2166
|
22 |
+
- eval_loss: 19.7940
|
23 |
+
- eval_runtime: 17.3332
|
24 |
+
- eval_samples_per_second: 57.693
|
25 |
+
- eval_steps_per_second: 7.212
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
45 |
### Training hyperparameters
|
46 |
|
47 |
The following hyperparameters were used during training:
|
48 |
+
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=ce, layer_mapper=None, projector=None))
|
49 |
- train_embeddings: True
|
50 |
- learning_rate: 4e-05
|
51 |
- train_batch_size: 8
|
|
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
|
65 |
+
| 0 | 0 | 55429.6875 | 57698.8047 | 24.5150 | 17.4179 | 57.412 | 7.177 | 56988.9141 |
|
66 |
+
| 1000 | 0.0808 | 702.9320 | 4403.8062 | 20.5050 | 17.3512 | 57.633 | 7.204 | 19095.4688 |
|
67 |
+
| 2000 | 0.1616 | 507.8192 | 3252.5339 | 20.3170 | 17.3451 | 57.653 | 7.207 | 2454.8054 |
|
68 |
+
| 3000 | 0.2424 | 418.1162 | 2743.4949 | 20.2070 | 17.3188 | 57.741 | 7.218 | 1193.5658 |
|
69 |
+
| 4000 | 0.3232 | 372.6640 | 2567.2002 | 20.1200 | 17.2361 | 58.018 | 7.252 | 1026.6641 |
|
70 |
+
| 5000 | 0.4040 | 320.0249 | 2154.7588 | 20.0340 | 17.3151 | 57.753 | 7.219 | 1183.4081 |
|
71 |
+
| 6000 | 0.4848 | 278.3867 | 1778.2332 | 19.9610 | 17.3435 | 57.658 | 7.207 | 869.0625 |
|
72 |
+
| 7000 | 0.5657 | 251.7534 | 1568.9419 | 19.9040 | 17.4023 | 57.464 | 7.183 | 807.5215 |
|
73 |
+
| 8000 | 0.6465 | 230.5502 | 1399.7903 | 19.8380 | 17.3855 | 57.519 | 7.19 | 816.4125 |
|
74 |
+
| 9000 | 0.7273 | 208.9635 | 1351.4938 | 19.7940 | 17.3332 | 57.693 | 7.212 | 781.2166 |
|
75 |
+
| 10000 | 0.8081 | 192.8560 | 1211.9225 | 19.7530 | 17.3032 | 57.793 | 7.224 | 608.5041 |
|
76 |
+
| 11000 | 0.8889 | 179.3916 | 1140.7820 | 19.6930 | 17.2721 | 57.897 | 7.237 | 624.0573 |
|
77 |
+
| 12000 | 0.9697 | 161.3999 | 997.4732 | 19.6480 | 17.21 | 58.106 | 7.263 | 560.2280 |
|
78 |
+
| 12375 | 1.0 | 158.3214 | 948.9705 | 19.6380 | 17.3071 | 57.78 | 7.222 | 575.3149 |
|
79 |
|
80 |
### Framework versions
|
81 |
- Distily 0.2.0
|
logs/attn_loss_fn=ce, attn_weight=2.0/events.out.tfevents.1723680689.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e1a844ba82d26f8d754402c5f272177311e462e03ec1cfb8aa86fe69a0533e74
|
3 |
+
size 249
|