End of training
Browse files
README.md
ADDED
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: roneneldan/TinyStories-33M
|
3 |
+
library_name: Distily
|
4 |
+
tags:
|
5 |
+
- generated_from_trainer
|
6 |
+
model-index:
|
7 |
+
- name: distily_bench_obj_cross_v2.2
|
8 |
+
results: []
|
9 |
+
---
|
10 |
+
|
11 |
+
# distily_bench_obj_cross_v2.2
|
12 |
+
|
13 |
+
This student model is distilled from the teacher model [roneneldan/TinyStories-33M](https://huggingface.co/roneneldan/TinyStories-33M) using the dataset (unspecified).
|
14 |
+
|
15 |
+
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
+
|
17 |
+
It achieves the following results on the evaluation set:
|
18 |
+
- eval_enwikippl: 201.4308
|
19 |
+
- eval_frwikippl: 134811.7969
|
20 |
+
- eval_zhwikippl: 2802169.0
|
21 |
+
- eval_tinystoriesppl: 8.5017
|
22 |
+
- eval_loss: 1.1632
|
23 |
+
- eval_runtime: 13.1959
|
24 |
+
- eval_samples_per_second: 75.781
|
25 |
+
- eval_steps_per_second: 9.473
|
26 |
+
|
27 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
+
should probably proofread and complete it, then remove this comment.
|
29 |
+
|
30 |
+
## Model description
|
31 |
+
|
32 |
+
More information needed
|
33 |
+
|
34 |
+
## Intended uses & limitations
|
35 |
+
|
36 |
+
More information needed
|
37 |
+
|
38 |
+
## Training and evaluation data
|
39 |
+
|
40 |
+
More information needed
|
41 |
+
-->
|
42 |
+
|
43 |
+
## Training procedure
|
44 |
+
|
45 |
+
### Training hyperparameters
|
46 |
+
|
47 |
+
The following hyperparameters were used during training:
|
48 |
+
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
|
49 |
+
- train_embeddings: True
|
50 |
+
- learning_rate: 0.04
|
51 |
+
- train_batch_size: 8
|
52 |
+
- eval_batch_size: 8
|
53 |
+
- seed: 42
|
54 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
55 |
+
- lr_scheduler_type: linear
|
56 |
+
- lr_scheduler_warmup_ratio: 0.5
|
57 |
+
- num_epochs: 1.0
|
58 |
+
|
59 |
+
### Resource Usage
|
60 |
+
Peak GPU Memory: 8.0557 GB
|
61 |
+
|
62 |
+
### Eval-Phase Metrics
|
63 |
+
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
64 |
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
65 |
+
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
66 |
+
| 0 | 0 | 106439.8672 | 83269.1172 | 6.7670 | 13.084 | 76.429 | 9.554 | 124919.8828 | 108523.1641 |
|
67 |
+
| 500 | 0.0404 | 540.4661 | 2634702.25 | 4.5202 | 13.2137 | 75.679 | 9.46 | 8.4174 | 83522400.0 |
|
68 |
+
| 1000 | 0.0808 | 1497.5195 | 15870873.0 | 2.2889 | 13.1749 | 75.902 | 9.488 | 14.0065 | 464464704.0 |
|
69 |
+
| 1500 | 0.1212 | 1866.7501 | 54564276.0 | 1.7749 | 13.1457 | 76.071 | 9.509 | 12.2171 | 3062959872.0 |
|
70 |
+
| 2000 | 0.1616 | 354.3223 | 675337.5625 | 1.3000 | 13.1662 | 75.952 | 9.494 | 9.5618 | 7180616.0 |
|
71 |
+
| 2500 | 0.2020 | 237.2996 | 200161.4531 | 1.2027 | 13.1313 | 76.154 | 9.519 | 9.2355 | 2578359.25 |
|
72 |
+
| 3000 | 0.2424 | 209.1669 | 144669.0625 | 1.1789 | 13.1334 | 76.142 | 9.518 | 8.7617 | 2323565.0 |
|
73 |
+
| 3500 | 0.2828 | 199.7487 | 140412.8906 | 1.1786 | 13.1764 | 75.893 | 9.487 | 8.3659 | 2391488.5 |
|
74 |
+
| 4000 | 0.3232 | 194.1800 | 130293.7578 | 1.1813 | 13.1424 | 76.089 | 9.511 | 8.1468 | 2006979.125 |
|
75 |
+
| 4500 | 0.3636 | 192.8458 | 132104.8594 | 1.1847 | 13.2475 | 75.486 | 9.436 | 8.0278 | 1976689.25 |
|
76 |
+
| 5000 | 0.4040 | 204.8733 | 171334.5781 | 1.1910 | 13.2362 | 75.55 | 9.444 | 7.8049 | 7161488.0 |
|
77 |
+
| 5500 | 0.4444 | 195.0279 | 158004.8125 | 1.1950 | 13.2423 | 75.516 | 9.439 | 7.6020 | 5050465.0 |
|
78 |
+
| 6000 | 0.4848 | 190.4297 | 152376.5 | 1.1980 | 13.2444 | 75.504 | 9.438 | 7.4381 | 3569331.0 |
|
79 |
+
| 6500 | 0.5253 | 188.9310 | 144618.1562 | 1.1982 | 13.2202 | 75.642 | 9.455 | 7.4149 | 3188412.75 |
|
80 |
+
| 7000 | 0.5657 | 194.4358 | 148488.4219 | 1.1898 | 13.2162 | 75.664 | 9.458 | 7.6775 | 3383867.0 |
|
81 |
+
| 7500 | 0.6061 | 197.3188 | 155367.4844 | 1.1877 | 13.216 | 75.666 | 9.458 | 7.7428 | 3572188.0 |
|
82 |
+
| 8000 | 0.6465 | 201.1151 | 143734.7344 | 1.1764 | 13.2338 | 75.564 | 9.446 | 8.2883 | 3349733.0 |
|
83 |
+
| 8500 | 0.6869 | 200.3220 | 141594.625 | 1.1751 | 13.1923 | 75.802 | 9.475 | 8.2352 | 3220043.0 |
|
84 |
+
| 9000 | 0.7273 | 200.8854 | 134460.8906 | 1.1695 | 13.2139 | 75.678 | 9.46 | 8.4530 | 2981093.25 |
|
85 |
+
| 9500 | 0.7677 | 201.4308 | 134811.7969 | 1.1632 | 13.1959 | 75.781 | 9.473 | 8.5017 | 2802169.0 |
|
86 |
+
| 10000 | 0.8081 | 199.1423 | 119070.3125 | 1.1540 | 13.2527 | 75.456 | 9.432 | 8.8006 | 2319847.5 |
|
87 |
+
| 10500 | 0.8485 | 196.0125 | 108148.8438 | 1.1479 | 13.2351 | 75.556 | 9.445 | 8.9736 | 2071720.5 |
|
88 |
+
| 11000 | 0.8889 | 184.3195 | 84100.1484 | 1.1397 | 13.2039 | 75.735 | 9.467 | 9.2554 | 1337898.0 |
|
89 |
+
| 11500 | 0.9293 | 187.4005 | 83163.6484 | 1.1355 | 13.2213 | 75.635 | 9.454 | 9.7272 | 1300591.875 |
|
90 |
+
| 12000 | 0.9697 | 192.9990 | 87212.625 | 1.1357 | 13.295 | 75.216 | 9.402 | 10.3592 | 1611324.75 |
|
91 |
+
| 12375 | 1.0 | 193.3956 | 88350.2109 | 1.1344 | 13.1951 | 75.786 | 9.473 | 10.2582 | 1620378.125 |
|
92 |
+
|
93 |
+
### Framework versions
|
94 |
+
- Distily 0.2.0
|
95 |
+
- Transformers 4.44.0
|
96 |
+
- Pytorch 2.3.0
|
97 |
+
- Datasets 2.20.0
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 50256,
|
4 |
+
"eos_token_id": 50256,
|
5 |
+
"transformers_version": "4.44.0"
|
6 |
+
}
|
logs/learning_rate=0.04, lr_scheduler_type=linear, warmup_ratio=0.5/events.out.tfevents.1723844851.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dc53912a7a0161af9b69cd5fdce139b103f6ca2eedc4489c3adee7ab1ff7e83c
|
3 |
+
size 307
|