End of training

Browse files

Files changed (3) hide show

README.md +97 -0
generation_config.json +6 -0
logs/learning_rate=0.04, lr_scheduler_type=linear, warmup_ratio=0.5/events.out.tfevents.1723844851.93d6cbb3ad53 +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,97 @@

+---
+base_model: roneneldan/TinyStories-33M
+library_name: Distily
+tags:
+- generated_from_trainer
+model-index:
+- name: distily_bench_obj_cross_v2.2
+  results: []
+---
+# distily_bench_obj_cross_v2.2
+This student model is distilled from the teacher model [roneneldan/TinyStories-33M](https://huggingface.co/roneneldan/TinyStories-33M) using the dataset (unspecified).
+The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
+It achieves the following results on the evaluation set:
+- eval_enwikippl: 201.4308
+- eval_frwikippl: 134811.7969
+- eval_zhwikippl: 2802169.0
+- eval_tinystoriesppl: 8.5017
+- eval_loss: 1.1632
+- eval_runtime: 13.1959
+- eval_samples_per_second: 75.781
+- eval_steps_per_second: 9.473
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+-->
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
+- train_embeddings: True
+- learning_rate: 0.04
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.5
+- num_epochs: 1.0
+### Resource Usage
+Peak GPU Memory: 8.0557 GB
+### Eval-Phase Metrics
+| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
+| 0 | 0 | 106439.8672 | 83269.1172 | 6.7670 | 13.084 | 76.429 | 9.554 | 124919.8828 | 108523.1641 |
+| 500 | 0.0404 | 540.4661 | 2634702.25 | 4.5202 | 13.2137 | 75.679 | 9.46 | 8.4174 | 83522400.0 |
+| 1000 | 0.0808 | 1497.5195 | 15870873.0 | 2.2889 | 13.1749 | 75.902 | 9.488 | 14.0065 | 464464704.0 |
+| 1500 | 0.1212 | 1866.7501 | 54564276.0 | 1.7749 | 13.1457 | 76.071 | 9.509 | 12.2171 | 3062959872.0 |
+| 2000 | 0.1616 | 354.3223 | 675337.5625 | 1.3000 | 13.1662 | 75.952 | 9.494 | 9.5618 | 7180616.0 |
+| 2500 | 0.2020 | 237.2996 | 200161.4531 | 1.2027 | 13.1313 | 76.154 | 9.519 | 9.2355 | 2578359.25 |
+| 3000 | 0.2424 | 209.1669 | 144669.0625 | 1.1789 | 13.1334 | 76.142 | 9.518 | 8.7617 | 2323565.0 |
+| 3500 | 0.2828 | 199.7487 | 140412.8906 | 1.1786 | 13.1764 | 75.893 | 9.487 | 8.3659 | 2391488.5 |
+| 4000 | 0.3232 | 194.1800 | 130293.7578 | 1.1813 | 13.1424 | 76.089 | 9.511 | 8.1468 | 2006979.125 |
+| 4500 | 0.3636 | 192.8458 | 132104.8594 | 1.1847 | 13.2475 | 75.486 | 9.436 | 8.0278 | 1976689.25 |
+| 5000 | 0.4040 | 204.8733 | 171334.5781 | 1.1910 | 13.2362 | 75.55 | 9.444 | 7.8049 | 7161488.0 |
+| 5500 | 0.4444 | 195.0279 | 158004.8125 | 1.1950 | 13.2423 | 75.516 | 9.439 | 7.6020 | 5050465.0 |
+| 6000 | 0.4848 | 190.4297 | 152376.5 | 1.1980 | 13.2444 | 75.504 | 9.438 | 7.4381 | 3569331.0 |
+| 6500 | 0.5253 | 188.9310 | 144618.1562 | 1.1982 | 13.2202 | 75.642 | 9.455 | 7.4149 | 3188412.75 |
+| 7000 | 0.5657 | 194.4358 | 148488.4219 | 1.1898 | 13.2162 | 75.664 | 9.458 | 7.6775 | 3383867.0 |
+| 7500 | 0.6061 | 197.3188 | 155367.4844 | 1.1877 | 13.216 | 75.666 | 9.458 | 7.7428 | 3572188.0 |
+| 8000 | 0.6465 | 201.1151 | 143734.7344 | 1.1764 | 13.2338 | 75.564 | 9.446 | 8.2883 | 3349733.0 |
+| 8500 | 0.6869 | 200.3220 | 141594.625 | 1.1751 | 13.1923 | 75.802 | 9.475 | 8.2352 | 3220043.0 |
+| 9000 | 0.7273 | 200.8854 | 134460.8906 | 1.1695 | 13.2139 | 75.678 | 9.46 | 8.4530 | 2981093.25 |
+| 9500 | 0.7677 | 201.4308 | 134811.7969 | 1.1632 | 13.1959 | 75.781 | 9.473 | 8.5017 | 2802169.0 |
+| 10000 | 0.8081 | 199.1423 | 119070.3125 | 1.1540 | 13.2527 | 75.456 | 9.432 | 8.8006 | 2319847.5 |
+| 10500 | 0.8485 | 196.0125 | 108148.8438 | 1.1479 | 13.2351 | 75.556 | 9.445 | 8.9736 | 2071720.5 |
+| 11000 | 0.8889 | 184.3195 | 84100.1484 | 1.1397 | 13.2039 | 75.735 | 9.467 | 9.2554 | 1337898.0 |
+| 11500 | 0.9293 | 187.4005 | 83163.6484 | 1.1355 | 13.2213 | 75.635 | 9.454 | 9.7272 | 1300591.875 |
+| 12000 | 0.9697 | 192.9990 | 87212.625 | 1.1357 | 13.295 | 75.216 | 9.402 | 10.3592 | 1611324.75 |
+| 12375 | 1.0 | 193.3956 | 88350.2109 | 1.1344 | 13.1951 | 75.786 | 9.473 | 10.2582 | 1620378.125 |
+### Framework versions
+- Distily 0.2.0
+- Transformers 4.44.0
+- Pytorch 2.3.0
+- Datasets 2.20.0

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 50256,
+  "eos_token_id": 50256,
+  "transformers_version": "4.44.0"
+}

logs/learning_rate=0.04, lr_scheduler_type=linear, warmup_ratio=0.5/events.out.tfevents.1723844851.93d6cbb3ad53 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc53912a7a0161af9b69cd5fdce139b103f6ca2eedc4489c3adee7ab1ff7e83c
+size 307