Edit model card

distilgpt2-finetuned-stories

This model is a fine-tuned version of distilgpt2 on the demelin/understanding_fables dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3089

Autoregressive and Prefix Language Modelling

Language Modelling, especially text generation works on the principle of generating the next token based on its previous antecedents.

This is what Autoregressive modelling are based on, it predicts the next token i.e. word here on the basis of token preceding it. Here, we take P(wi|wi-1), where wi is next word and wi-1 is token preceeding it, and P is the probbaility pf generating wi wrt wi-1

But for Prefix Language modelling, we consider input into function and consider it in generation of our next word, i.e. the input is used as a context for generation of next tokens, calculating the conditional probability of next work wrt context. P(w|x), where w is next token and x is context and P is probability of getting w wrt x context.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
No log 1.0 20 3.4065
No log 2.0 40 3.3288
No log 3.0 60 3.3089

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
0
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from