PyCraft-1: A 55M Python Code LLM Trained From Scratch on Consumer Hardware

Model Description

PyCraft-1 is a 55.3M parameter decoder-only transformer trained entirely from scratch on Python code using a single NVIDIA RTX 3050 laptop GPU (4GB VRAM). It demonstrates that a domain-specific code LLM can be trained without cloud compute, following a quality-first data curriculum inspired by the Phi-1 "Textbooks Are All You Need" approach.

Architecture

Component Choice
Architecture Decoder-only transformer
Parameters 55.3M
Attention Grouped Query Attention (8Q / 2KV heads)
Positional encoding RoPE
QK-Norm RMSNorm on Q and K (OLMo 2 / Qwen 3, 2025)
FFN SwiGLU
Normalisation RMSNorm pre-norm
Training objective Causal LM + Fill-in-the-Middle (FIM, 50%)
Context window 1024 tokens
Vocabulary 32,000 (custom BPE, Python-tuned)

Training

Pretraining:

  • 309,221 curated Python examples from 6 open sources
  • Quality curriculum: scored on 5 heuristics, ordered best-first
  • 4,000 steps | 1.05B tokens | Loss 1.16 | PPL 3.2

Supervised Fine-Tuning:

  • Magicoder-OSS-Instruct-75K (Python subset, 40k examples)
  • 400 steps | Loss 1.15 | PPL 3.15

Hardware: NVIDIA RTX 3050 Laptop GPU 4GB, Ryzen 7 6000, 16GB RAM Total training time: ~22 hours

Novel Contributions

  1. Quality-first data curriculum — 309k Python examples scored and ordered by educational value (docstrings, type hints, comments, naming conventions, length). Validates Phi-1 hypothesis at the resource-constrained regime.

  2. QK-Norm — RMSNorm applied to Q and K before RoPE, adopted from OLMo 2 and Qwen 3 (2025), improving training stability in small models.

  3. FIM pretraining on 4GB VRAM — Fill-in-the-Middle objective using PSM format trained on a consumer GPU via gradient checkpointing and BF16 mixed precision.

  4. Full reproducibility — complete open-source pipeline runnable on consumer hardware in under one week.

Evaluation

Metric Value
Pretraining loss 1.16
Pretraining PPL 3.20
SFT loss 1.15
SFT PPL 3.15
Held-out PPL (binary search) 1.4
Held-out PPL (Stack class) 1.7
Held-out PPL (average, 5 samples) 2.16

Example Usage

# Clone the repo and install dependencies first
# pip install torch safetensors tokenizers datasets

import torch
from safetensors.torch import load_file
from model.config import get_config_120m
from model.pycraft_model import PyCraftModel
from tokenizer.tokenizer_utils import PyCraftTokenizer

device    = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = PyCraftTokenizer("tokenizer/vocab/tokenizer.json")

cfg            = get_config_120m()
cfg.vocab_size = 32000
cfg.dropout    = 0.0

model = PyCraftModel(cfg).to(device)
model.load_state_dict(load_file("model.safetensors", device=device))
model.eval()

prompt = "def is_palindrome(s: str) -> bool:\n    "
ids    = tokenizer.encode(prompt)
inp    = torch.tensor(ids, dtype=torch.long).unsqueeze(0).to(device)

with torch.no_grad():
    out = model.generate(inp, max_new_tokens=80, temperature=0.7, top_k=40)

print(tokenizer.decode(out[0].tolist()))

Limitations

  • 55M parameters: generates plausible Python but may produce logical errors
  • Context window: 1024 tokens
  • Best on standard algorithms, data structures, and common library patterns
  • Not a conversational assistant

Citation

@misc{inamdar2026pycraft,
  title={PyCraft-1: Training a Python Code LLM From Scratch on Consumer Hardware},
  author={Inamdar, Rohan},
  year={2026},
  institution={University of Manchester, MSc Artificial Intelligence},
}

Links

Downloads last month
107
Safetensors
Model size
55.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train imshadow0/pycraft-1