code-1b-pretrain-v3

A 1.13B parameter GPT-2 architecture causal language model pretrained from scratch on a curated mix of Python code and programming literature.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rovdetection/code-1b-pretrain-v3")
model     = AutoModelForCausalLM.from_pretrained("rovdetection/code-1b-pretrain-v3")

inputs = tokenizer("def fibonacci(n):", return_tensors="pt")
out    = model.generate(**inputs, max_new_tokens=100, temperature=0.8, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Training summary

Architecture GPT-2 (1.13B params)
Total steps 30,000
Peak LR 3e-5 (cosine with warmup)
Effective batch size 32 (gradient accumulation)
Precision fp16 + 8-bit Adam (bitsandbytes)
Eval perplexity (held-out Python) 3.65

Dataset mix (Phase 4 β€” final 7k steps)

Dataset Weight
bigcode/starcoderdata (Python) 35%
codeparrot/codeparrot-clean 25%
open-phi/programming_books_llama 25%
greengerong/leetcode 15%

Training phases

Phase Steps LR range Notes
1 0 – 10,000 0 β†’ 3e-5 Warmup + early descent
2–3 10,000 – 23,000 3e-5 β†’ 4e-6 Cosine decay, baseline mix
4 23,000 – 30,000 4e-6 β†’ ~0 Quality shift: StarCoder ↑35%

Repo structure

The repo root contains inference weights only (model.safetensors, tokenizer, config.json). The last-checkpoint/ subfolder contains the full training state (optimizer, scheduler, scaler, RNG) for resuming training.

Downloads last month
1,203
Safetensors
Model size
1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rovdetection/code-1b-pretrain-v3

Finetuned
(2170)
this model
Finetunes
1 model