YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Haney Chat 537M
A GPT-style language model built from scratch in PyTorch and trained on a custom 517M-token corpus.
Overview
Haney Chat is a decoder-only Transformer language model inspired by GPT architectures. The project started as an educational experiment and scaled from small models to a 537M parameter model trained on conversational and story datasets.
Model Specifications
| Component | Value |
|---|---|
| Parameters | 537.6M |
| Layers | 24 |
| Attention Heads | 20 |
| Embedding Size | 1280 |
| Context Length | 1024 |
| Architecture | Decoder-only Transformer |
| Framework | PyTorch |
| Precision | Mixed Precision (AMP) |
Dataset
The training corpus contains approximately 517 million tokens and combines:
- TinyStories
- Dolly-15k
- UltraChat
The datasets were merged into a unified training format using custom preprocessing scripts.
Training Hardware
- NVIDIA L40S (46 GB VRAM)
- Lightning AI Studio
Training Progression
| Model | Parameters |
|---|---|
| TinyGPT | 29M |
| TinyGPT | 76M |
| TinyGPT | 162M |
| Haney Chat | 354M |
| Haney Chat | 537M |
Example Generation
Prompt:
<|story|> Once upon a time
Output:
Once upon a time, there was a little girl named Lily. She loved to play outside and run around in the garden...
Repository Structure
data/
models/
scripts/
tokenizer/
checkpoints/
Features
- Custom Transformer implementation
- GPT-style causal self-attention
- Multi-head attention
- Pre-LayerNorm architecture
- Mixed precision training
- Checkpoint resume support
- GPT-2 tokenizer integration
Future Work
- Improved instruction tuning
- Larger conversational datasets
- GGUF conversion
- Ollama integration
- Local deployment
- Model evaluation benchmarks
License
MIT License
Author
Sakthivel T
- Downloads last month
- 6