GPT‑2 Fine‑tuned on English Quotes

Model Description

This model is a fine‑tuned version of GPT‑2 small (124M parameters) on the Abirate/english_quotes dataset.
The goal is to generate text in the style of philosophical or literary quotes, including the author’s name.

⚠️ This model was created for educational and research purposes only. It is not intended for production use.
It demonstrates full fine‑tuning of a causal language model on a small dataset and the improvements in generation quality compared to the base model.

Base model: gpt2
Task: Causal language modelling (text generation)
Fine‑tuning type: Full fine‑tuning (all parameters updated)

Intended Uses & Limitations

Direct Use (Research / Experimentation)

You can use this model to generate short quotes given a prompt. The model expects prompts to start with the special token <|startoftext|> and will learn to produce a quote followed by an author and the <|endoftext|> token.

Example:

from transformers import pipeline

generator = pipeline("text-generation", model="lorcannrauzduel/gpt2-citations")
output = generator("<|startoftext|> The secret to", max_new_tokens=50, do_sample=True)
print(output[0]['generated_text'])

Limitations

  • The model is small (124M) and was trained on only ~2,500 quotes. It may sometimes produce repetitive or nonsensical outputs.
  • It only generates English text.
  • It does not have factual knowledge about the authors; it merely mimics the style of the training quotes.
  • Not suitable for any commercial or critical application.

Training Details

Training Data

  • Dataset: Abirate/english_quotes – 2,508 quotes, each with a quote and an author field.
  • Preprocessing: Each example was formatted as:
    <|startoftext|> "quote" — author <|endoftext|>
    
    The special tokens help the model learn where a quote starts and ends.

Training Procedure

The model was trained for 5 epochs using the Hugging Face Trainer with the following hyperparameters:

Hyperparameter Value
Learning rate 5e-5
Batch size (per device) 8
Gradient accumulation 2
Effective batch size 16
Warmup steps 100
Weight decay 0.01
Optimizer AdamW
Precision fp16
Max sequence length 128
Training steps 1410

Hardware: NVIDIA Tesla T4 (15 GB VRAM) on Google Colab / Kaggle.
Training time: ~5 minutes.

Evaluation Results

The final training loss was 2.506, corresponding to a perplexity of 12.26.
Validation loss stagnated around 2.30, indicating a slight overfitting after 3‑4 epochs – acceptable for a small generative model.

How to Use the Model

With 🤗 Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lorcannrauzduel/gpt2-citations")
model = AutoModelForCausalLM.from_pretrained("lorcannrauzduel/gpt2-citations")

prompt = "<|startoftext|> Life is"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.9)
print(tokenizer.decode(output[0], skip_special_tokens=False))

With Pipeline

from transformers import pipeline

pipe = pipeline("text-generation", model="lorcannrauzduel/gpt2-citations")
print(pipe("<|startoftext|> You can never", max_new_tokens=50)[0]['generated_text'])

With vLLM (for high‑throughput inference)

pip install vllm
vllm serve "lorcannrauzduel/gpt2-citations"

Then query with curl:

curl -X POST "http://localhost:8000/v1/completions" \
     -H "Content-Type: application/json" \
     --data '{
         "model": "lorcannrauzduel/gpt2-citations",
         "prompt": "<|startoftext|> The secret to",
         "max_tokens": 50,
         "temperature": 0.8
     }'

With Ollama (local deployment after GGUF conversion)

  1. Download the GGUF version from the repository (if available) or convert it yourself using llama.cpp.
  2. Create a Modelfile:
    FROM ./gpt2-citations-q4km.gguf
    SYSTEM "You are a quote generator."
    PARAMETER temperature 0.8
    PARAMETER stop "<|endoftext|>"
    
  3. Import and run:
    ollama create gpt2-citations -f Modelfile
    ollama run gpt2-citations "<|startoftext|> Life is"
    

Model Comparison (Base vs Fine‑tuned)

Prompt GPT‑2 Base (no fine‑tuning) GPT‑2 Fine‑tuned
`< startoftext > The secret to`
`< startoftext > Life is`
`< startoftext > You can never`

The fine‑tuned model consistently produces coherent quotes with an author attribution, while the base model generates irrelevant or repetitive text.

Environmental Impact

Training was performed on a cloud GPU (Tesla T4) for about 5 minutes. Estimated CO₂ emissions are negligible (< 0.01 kg CO₂eq).

Acknowledgements

  • The Hugging Face team for transformers and datasets.
  • The original GPT‑2 paper by Radford et al. (2019).
  • Dataset provided by Abirate.

License

This model is released under the MIT license (same as the original GPT‑2 small).


Model card created by lorcannrauzduel for research and experimentation purposes.

Downloads last month
43
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support