DistilGPT2 English language model pretrained with the supervision of GPT2 (the smallest version of GPT2) on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset. The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 124M parameters for GPT2). On average, DistilGPT2 is two times faster than GPT2.

On the WikiText-103 benchmark, GPT2 reaches a perplexity on the test set of 16.3 compared to 21.1 for DistilGPT2 (after fine-tuning on the train set).

We encourage to check GPT2 to know more about usage, limitations and potential biases.

Downloads last month
Hosted inference API
Text Generation
This model can be loaded on the Inference API on-demand.

Dataset used to train distilgpt2

Spaces using distilgpt2