Haney Chat 537M

A GPT-style language model built from scratch in PyTorch and trained on a custom 517M-token corpus.

Overview

Haney Chat is a decoder-only Transformer language model inspired by GPT architectures. The project started as an educational experiment and scaled from small models to a 537M parameter model trained on conversational and story datasets.

Model Specifications

Component	Value
Parameters	537.6M
Layers	24
Attention Heads	20
Embedding Size	1280
Context Length	1024
Architecture	Decoder-only Transformer
Framework	PyTorch
Precision	Mixed Precision (AMP)

Dataset

The training corpus contains approximately 517 million tokens and combines:

TinyStories
Dolly-15k
UltraChat

The datasets were merged into a unified training format using custom preprocessing scripts.

Training Hardware

NVIDIA L40S (46 GB VRAM)
Lightning AI Studio

Training Progression

Model	Parameters
TinyGPT	29M
TinyGPT	76M
TinyGPT	162M
Haney Chat	354M
Haney Chat	537M

Example Generation

Prompt:

<|story|> Once upon a time

Output:

Once upon a time, there was a little girl named Lily. She loved to play outside and run around in the garden...

Repository Structure

data/
models/
scripts/
tokenizer/
checkpoints/

Features

Custom Transformer implementation
GPT-style causal self-attention
Multi-head attention
Pre-LayerNorm architecture
Mixed precision training
Checkpoint resume support
GPT-2 tokenizer integration

Future Work

Improved instruction tuning
Larger conversational datasets
GGUF conversion
Ollama integration
Local deployment
Model evaluation benchmarks

License

MIT License

Author

Sakthivel T

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support