Custom GPT-2 (355M) Pre-Trained from Scratch & Instruction-Tuned via SFT

This repository hosts a custom-engineered, 355-million parameter GPT-2 style causal language model built completely from the ground up in PyTorch. The base model was pre-trained locally on OpenWebText, an open-source recreation of OpenAI's WebText dataset. This specific .pth file contains the Supervised Fine-Tuned (SFT) weights, aligned to accurately follow instructions and perform conversational tasks.

The project was built following the architectural principles outlined in Sebastian Raschka's "Build a Large Language Model (From Scratch)".

Model Details

Developed by: Carlos Garcia
Model Type: Causal Language Model (Transformer Architecture) with Instruction Fine-Tuning (SFT)
Language: English (en)
Parameters: 355M (Standard GPT-2 Medium scaling footprint)
Context Length: 1024 tokens
Date of Alignment: June 18, 2026

Architectural Dimensions

Component	Specification
Layers	24
Attention Heads	16
Embedding Dimension	1024
Vocabulary Size	50,257 (`tiktoken` GPT-2 BPE)
Query-Key-Value Bias	Disabled (`False`)

Intended Use

Primary Use: Educational experimentation, conversational AI research, and local instruction-following workflows.
Generation Style: Aligned to synthesize responsive, helpful text output to clear instruction prompts. It requires inputs explicitly structured with an instruction-response delimiter frame to perform reliably.

Training Data & Methodology

The model's development cycle consisted of two major phases:

1. Base Pre-Training

The underlying base model architecture was pre-trained completely from scratch on OpenWebText, an open-source replica of the Reddit-extracted outbound link text corpus originally utilized by OpenAI.

2. Instruction Tuning (SFT)

The model underwent Supervised Fine-Tuning utilizing the Alpaca dataset through 3 epochs.

Training Hyperparameters

Fine-tuning was executed locally using an optimized deep-learning workstation running a single NVIDIA GeForce RTX 5090.

Optimizer: AdamW
Weight Decay: 0.1
Learning Rate: 0.00005 ($5 \times 10^{-5}$)
Batch Size: 8
Epochs: 3
Hardware Setup: Single-node local training (RTX 5090)

How to Load and Run Inference Locally

Because this model was compiled from native, custom PyTorch source code rather than the Hugging Face transformers library wrappers, you must load the saved .pth state dictionary directly back into your custom script definition matching the architecture settings:

import torch
from p02_gpt_model import GPTModel, GPT_CONFIG_355M

# 1. Initialize custom model configuration
model = GPTModel(GPT_CONFIG_355M)

# 2. Map the state dictionary weights
MODEL_PATH = "../models/gsp-2/gsp2_355m_sft.pth"
model_state_dict = torch.load(MODEL_PATH, map_location="cpu", weights_only=True)
model.load_state_dict(model_state_dict)

model.eval()

Downloads last month: -; Downloads are not tracked for this model. How to track

cgarciams
/

gsp2_355m_sft