End of training
Browse files
README.md
CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
-
- eval_enwikippl:
|
19 |
-
- eval_frwikippl:
|
20 |
-
- eval_zhwikippl:
|
21 |
-
- eval_tinystoriesppl:
|
22 |
-
- eval_loss:
|
23 |
-
- eval_runtime: 11.
|
24 |
-
- eval_samples_per_second: 86.
|
25 |
-
- eval_steps_per_second: 10.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -47,7 +47,7 @@ More information needed
|
|
47 |
The following hyperparameters were used during training:
|
48 |
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=0, loss_fn=None, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10, loss_fn=kl, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
|
49 |
- train_embeddings: True
|
50 |
-
- learning_rate: 0.
|
51 |
- train_batch_size: 8
|
52 |
- eval_batch_size: 8
|
53 |
- seed: 42
|
@@ -62,32 +62,32 @@ Peak GPU Memory: 6.6287 GB
|
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
65 |
-
| 0 | 0 | 88697.0156 | 150478.2188 | 32.2330 | 11.
|
66 |
-
| 500 | 0.0404 |
|
67 |
-
| 1000 | 0.0808 |
|
68 |
-
| 1500 | 0.1212 |
|
69 |
-
| 2000 | 0.1616 |
|
70 |
-
| 2500 | 0.2020 |
|
71 |
-
| 3000 | 0.2424 |
|
72 |
-
| 3500 | 0.2828 |
|
73 |
-
| 4000 | 0.3232 |
|
74 |
-
| 4500 | 0.3636 |
|
75 |
-
| 5000 | 0.4040 |
|
76 |
-
| 5500 | 0.4444 |
|
77 |
-
| 6000 | 0.4848 |
|
78 |
-
| 6500 | 0.5253 |
|
79 |
-
| 7000 | 0.5657 |
|
80 |
-
| 7500 | 0.6061 |
|
81 |
-
| 8000 | 0.6465 |
|
82 |
-
| 8500 | 0.6869 |
|
83 |
-
| 9000 | 0.7273 |
|
84 |
-
| 9500 | 0.7677 |
|
85 |
-
| 10000 | 0.8081 |
|
86 |
-
| 10500 | 0.8485 |
|
87 |
-
| 11000 | 0.8889 |
|
88 |
-
| 11500 | 0.9293 |
|
89 |
-
| 12000 | 0.9697 |
|
90 |
-
| 12375 | 1.0 |
|
91 |
|
92 |
### Framework versions
|
93 |
- Distily 0.2.0
|
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
+
- eval_enwikippl: 87049.7578
|
19 |
+
- eval_frwikippl: 148519.8594
|
20 |
+
- eval_zhwikippl: 112743.5078
|
21 |
+
- eval_tinystoriesppl: 68038.7344
|
22 |
+
- eval_loss: 32.1160
|
23 |
+
- eval_runtime: 11.5146
|
24 |
+
- eval_samples_per_second: 86.847
|
25 |
+
- eval_steps_per_second: 10.856
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
47 |
The following hyperparameters were used during training:
|
48 |
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=0, loss_fn=None, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10, loss_fn=kl, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
|
49 |
- train_embeddings: True
|
50 |
+
- learning_rate: 0.0001
|
51 |
- train_batch_size: 8
|
52 |
- eval_batch_size: 8
|
53 |
- seed: 42
|
|
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
65 |
+
| 0 | 0 | 88697.0156 | 150478.2188 | 32.2330 | 11.5103 | 86.878 | 10.86 | 69390.6016 | 113346.8047 |
|
66 |
+
| 500 | 0.0404 | 87049.7578 | 148519.8594 | 32.1160 | 11.5316 | 86.718 | 10.84 | 67960.0703 | 112623.2578 |
|
67 |
+
| 1000 | 0.0808 | 87049.7578 | 148519.8594 | 32.1180 | 11.498 | 86.971 | 10.871 | 68016.2188 | 112743.5078 |
|
68 |
+
| 1500 | 0.1212 | 87049.7578 | 148519.8594 | 32.1180 | 11.5171 | 86.828 | 10.853 | 67993.7812 | 112743.5078 |
|
69 |
+
| 2000 | 0.1616 | 87049.7578 | 148519.8594 | 32.1160 | 11.5112 | 86.872 | 10.859 | 68038.7344 | 112743.5078 |
|
70 |
+
| 2500 | 0.2020 | 87049.7578 | 148519.8594 | 32.1160 | 11.5174 | 86.825 | 10.853 | 68038.7344 | 112743.5078 |
|
71 |
+
| 3000 | 0.2424 | 87049.7578 | 148519.8594 | 32.1160 | 11.5446 | 86.621 | 10.828 | 68016.2188 | 112743.5078 |
|
72 |
+
| 3500 | 0.2828 | 87049.7578 | 148519.8594 | 32.1160 | 11.5015 | 86.945 | 10.868 | 68038.7344 | 112743.5078 |
|
73 |
+
| 4000 | 0.3232 | 87049.7578 | 148519.8594 | 32.1160 | 11.5349 | 86.693 | 10.837 | 68038.7344 | 112743.5078 |
|
74 |
+
| 4500 | 0.3636 | 87049.7578 | 148519.8594 | 32.1160 | 11.5299 | 86.731 | 10.841 | 68038.7344 | 112743.5078 |
|
75 |
+
| 5000 | 0.4040 | 87049.7578 | 148519.8594 | 32.1160 | 11.5259 | 86.761 | 10.845 | 68038.7344 | 112743.5078 |
|
76 |
+
| 5500 | 0.4444 | 87049.7578 | 148519.8594 | 32.1160 | 11.5002 | 86.955 | 10.869 | 68038.7344 | 112743.5078 |
|
77 |
+
| 6000 | 0.4848 | 87049.7578 | 148603.5938 | 32.1160 | 11.5135 | 86.855 | 10.857 | 68061.25 | 112743.5078 |
|
78 |
+
| 6500 | 0.5253 | 87049.7578 | 148603.5938 | 32.1160 | 11.5069 | 86.904 | 10.863 | 68061.25 | 112743.5078 |
|
79 |
+
| 7000 | 0.5657 | 87049.7578 | 148603.5938 | 32.1160 | 11.509 | 86.889 | 10.861 | 68061.25 | 112743.5078 |
|
80 |
+
| 7500 | 0.6061 | 87049.7578 | 148603.5938 | 32.1160 | 11.508 | 86.896 | 10.862 | 68061.25 | 112743.5078 |
|
81 |
+
| 8000 | 0.6465 | 87049.7578 | 148603.5938 | 32.1160 | 11.5151 | 86.843 | 10.855 | 68038.7344 | 112743.5078 |
|
82 |
+
| 8500 | 0.6869 | 87049.7578 | 148519.8594 | 32.1160 | 11.4916 | 87.02 | 10.878 | 68038.7344 | 112743.5078 |
|
83 |
+
| 9000 | 0.7273 | 87049.7578 | 148519.8594 | 32.1160 | 11.5189 | 86.814 | 10.852 | 68038.7344 | 112743.5078 |
|
84 |
+
| 9500 | 0.7677 | 87049.7578 | 148519.8594 | 32.1160 | 11.5146 | 86.847 | 10.856 | 68038.7344 | 112743.5078 |
|
85 |
+
| 10000 | 0.8081 | 87049.7578 | 148519.8594 | 32.1160 | 11.5098 | 86.883 | 10.86 | 68038.7344 | 112743.5078 |
|
86 |
+
| 10500 | 0.8485 | 87049.7578 | 148519.8594 | 32.1160 | 11.5054 | 86.916 | 10.865 | 68038.7344 | 112743.5078 |
|
87 |
+
| 11000 | 0.8889 | 87049.7578 | 148519.8594 | 32.1160 | 11.5094 | 86.885 | 10.861 | 68038.7344 | 112743.5078 |
|
88 |
+
| 11500 | 0.9293 | 87049.7578 | 148519.8594 | 32.1160 | 11.5376 | 86.673 | 10.834 | 68038.7344 | 112743.5078 |
|
89 |
+
| 12000 | 0.9697 | 87049.7578 | 148519.8594 | 32.1160 | 11.494 | 87.002 | 10.875 | 68038.7344 | 112743.5078 |
|
90 |
+
| 12375 | 1.0 | 87049.7578 | 148519.8594 | 32.1160 | 11.4926 | 87.013 | 10.877 | 68038.7344 | 112743.5078 |
|
91 |
|
92 |
### Framework versions
|
93 |
- Distily 0.2.0
|
logs/hs_loss_fn=kl, hs_weight=10, learning_rate=0.0001/events.out.tfevents.1723878014.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3d9cefb4ed27bd5cdbb89802b1dd471965279c6adf682ebcf580c81dfa87f617
|
3 |
+
size 307
|