YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Haney Chat 537M

A GPT-style language model built from scratch in PyTorch and trained on a custom 517M-token corpus.

Overview

Haney Chat is a decoder-only Transformer language model inspired by GPT architectures. The project started as an educational experiment and scaled from small models to a 537M parameter model trained on conversational and story datasets.

Model Specifications

Component Value
Parameters 537.6M
Layers 24
Attention Heads 20
Embedding Size 1280
Context Length 1024
Architecture Decoder-only Transformer
Framework PyTorch
Precision Mixed Precision (AMP)

Dataset

The training corpus contains approximately 517 million tokens and combines:

  • TinyStories
  • Dolly-15k
  • UltraChat

The datasets were merged into a unified training format using custom preprocessing scripts.

Training Hardware

  • NVIDIA L40S (46 GB VRAM)
  • Lightning AI Studio

Training Progression

Model Parameters
TinyGPT 29M
TinyGPT 76M
TinyGPT 162M
Haney Chat 354M
Haney Chat 537M

Example Generation

Prompt:

<|story|> Once upon a time

Output:

Once upon a time, there was a little girl named Lily. She loved to play outside and run around in the garden...

Repository Structure

data/
models/
scripts/
tokenizer/
checkpoints/

Features

  • Custom Transformer implementation
  • GPT-style causal self-attention
  • Multi-head attention
  • Pre-LayerNorm architecture
  • Mixed precision training
  • Checkpoint resume support
  • GPT-2 tokenizer integration

Future Work

  • Improved instruction tuning
  • Larger conversational datasets
  • GGUF conversion
  • Ollama integration
  • Local deployment
  • Model evaluation benchmarks

License

MIT License

Author

Sakthivel T

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support