tarekziade/t5-small-headline-generator-sft-3-3

Shrinked version of https://huggingface.co/JulesBelveze/t5-small-headline-generator

This model is a compression experiment. The original model is 242MiB and this model, once quantized, is down to ~50MiB with an accuracy of ~92%+

50% of the encoder and decoder layers were pruned, and the model was fine-tuned again on the original dataset for a single epoch.

The "shrink and fine-tune" strategy was inspired from https://arxiv.org/pdf/2010.13002.pdf

The model was then tested against the dataset using the ROUGE scoring, and the scoring was compared with the scoring of the original model to produce the accuracy scores below:

rouge-1 Accuracy:

F1 Accuracy: 92.27%
Precision Accuracy: 91.83%
Recall Accuracy: 93.95%

rouge-2 Accuracy:

F1 Accuracy: 94.48%
Precision Accuracy: 95.40%
Recall Accuracy: 92.01%

rouge-l Accuracy:

F1 Accuracy: 92.53%
Precision Accuracy: 92.11%
Recall Accuracy: 94.17%

You can find the train script here https://github.com/tarekziade/distill-t5/blob/main/sft.py and the evalutation script here https://github.com/tarekziade/distill-t5/blob/main/evaluation.py