GPT-2 (124M) From Scratch to RLHF Alignment
This repository contains the model weights for a custom GPT-2 (124M) model trained entirely from scratch, then aligned using Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO).
Model Details
- Base Architecture: GPT-2 (124 Million Parameters)
- Training Framework: Custom PyTorch implementation (inspired by Andrej Karpathy's Let's Build GPT tutorial).
- Language: English (
en) - Task: Summarization (
summarization)
Training Pipeline
The model was developed through a complete, three-stage modern alignment pipeline, trained locally on a dual T4 GPU environment.
1. Pretraining
The raw base model was trained from a random initialization on the FineWeb-Edu dataset.
- Tokens Trained: ~10 Billion (1 full epoch)
- Final Validation Loss: 3.0048
- HellaSwag Accuracy: 28.95% (Note: capable of ~33% with longer training).
2. Supervised Fine-Tuning (SFT)
To teach the model how to summarize, it was fine-tuned on the OpenAI Summarize TL;DR dataset.
- Final Validation Loss: 2.5321 (at step 3400)
- Checkpoint Included:
best_sft.pt
3. Reinforcement Learning (RLHF / PPO)
A Reward Model was trained on human preference data using OpenAI Summarize Comparisons. The SFT model was then fine-tuned using PPO to maximize the reward signal while penalizing KL-divergence from the reference model to prevent the language from degrading.
- Checkpoint Included:
ppo_latest.pt
Eval Results
The model has been evaluated qualitatively against its SFT baseline. PPO alignment successfully prevents the model from hallucinating or copying text verbatim, resulting in highly abstractive and concise summaries.
Qualitative Example
Input Post: I have a roommate who keeps eating my food without asking. Every single time I buy groceries, half of it disappears within 48 hours. I tried talking to him politely but he just laughs it off and says he will replace it, but he never does. Should I get a mini-fridge for my room or confront him one last time more aggressively?
PPO Summary: I have a roommate who keeps eating my groceries without asking, and I don't want him to do it again. Should I confront him?
Usage & Inference
Because this model was built from scratch without relying on transformers high-level abstraction, it requires a custom inference loop. The weights are provided in raw PyTorch .pt format.
You can interact with the live inference API here: GPT-2 Summarizer App
Model tree for popboat1/gpt2-summarizer-models
Base model
openai-community/gpt2