Edit model card



DistilGPT2 English language model pretrained with the supervision of GPT2 (the smallest version of GPT2) on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset. The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 124M parameters for GPT2). On average, DistilGPT2 is two times faster than GPT2.

On the WikiText-103 benchmark, GPT2 reaches a perplexity on the test set of 16.3 compared to 21.1 for DistilGPT2 (after fine-tuning on the train set).

We encourage to check GPT2 to know more about usage, limitations and potential biases.

Downloads last month


Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .

Dataset used to train GuillaumeSalouHF/distilgpt2