Helios Nova 306M-Instruct-2606

Helios Nova 306M-Instruct-2606

Helios Nova 306M-Instruct-2606 is a 306M-parameter, dense, decoder-only language model for instruction following and conversation. It is the reinforcement-learning-aligned release in the Helios Nova family: a from-scratch base model, instruction-tuned with supervised fine-tuning, then improved with Group Relative Policy Optimization (GRPO) using verifiable, rule-based rewards.

The model was developed independently and end-to-end by a single author — architecture, tokenizer, pre-training, post-training, and evaluation. It was designed to study capability per unit of compute at small scale: where sub-billion-parameter quality comes from architecture and data quality rather than from data volume alone.

At ~80× less pre-training data, Helios Nova reaches 96% of SmolLM2-360M on commonsense reasoning (Winogrande + PIQA), measured on an identical evaluation harness. The base model was pre-trained on 50B tokens on a single GPU for under USD 190 of compute.

The model is distributed both as GGUF quantizations (for llama.cpp: CUDA, Apple Metal, Vulkan, or CPU) and as full-precision safetensors (for PyTorch). Reference chat clients are provided in the companion GitHub repository.

Highlights

  • 306M dense decoder, custom architecture and 16k tokenizer, built from scratch.
  • GRPO-aligned: instruction-following (constraint-following pass-rate) improved by +18.3 points over the SFT baseline with no measurable capability regression.
  • Data-efficient: 96% of SmolLM2-360M commonsense reasoning at ~80× fewer pre-training tokens.
  • Low cost: base pre-training under USD 190 on a single H100; post-training on a single consumer iGPU.
  • Runs anywhere: pure-PyTorch path (any OS/CPU) and GGUF/llama.cpp path (CUDA / Metal / Vulkan / CPU).

Usage

The reference clients live in the GitHub repository and download these weights automatically on first run.

git clone https://github.com/rafaelespinosamena/Helios-Nova-306M-Instruct-2606.git
cd Helios-Nova-306M-Instruct-2606

PyTorch (any operating system, CPU or GPU, no system dependencies):

pip install -r requirements.txt
python chat.py

GGUF via llama.cpp (fastest; CUDA, Apple Metal, AMD/Intel Vulkan, or CPU):

# install llama.cpp once — macOS: `brew install llama.cpp`;
# otherwise download a release for your backend from github.com/ggml-org/llama.cpp/releases
python instruct_chat.py             # F16 (default, full quality)
python instruct_chat.py --model q8  # Q8_0, near-lossless, ~2x smaller
python instruct_chat.py --model q4  # Q4_K_M, smallest and fastest (CPU / edge)

Both clients apply the exact training chat template and stop sequences, so generation terminates cleanly at the end of each turn.

Files

File Size Description
Helios-Nova-306M-Instruct-2606-F16.gguf 584 MB Full precision (default)
Helios-Nova-306M-Instruct-2606-Q8_0.gguf 311 MB Near-lossless
Helios-Nova-306M-Instruct-2606-Q4_K_M.gguf 179 MB Smallest and fastest (CPU, edge)
model.safetensors (+ config.json, HeliosNova.py, tokenizer) 645 MB bf16 weights for PyTorch

Model architecture

Component Value
Parameters 305.8M (dense)
Layers / hidden size 24 / 1024 (depth-over-width, following the MobileLLM finding for sub-500M models)
Attention Grouped-Query Attention — 16 query heads, 4 key-value heads, head dimension 64
Feed-forward SwiGLU, intermediate size 3072
Positional encoding / norm RoPE (theta 10,000), QK-Norm, RMSNorm (pre-norm), tied input/output embeddings
Tokenizer / context Custom 16k BPE / 2048 tokens

Architecture diagram

Training

Pre-training (base model)

The base model, Helios-Nova-306M, was pre-trained on 50B tokens of FineWeb-Edu on a single NVIDIA H100 in under 120 hours, for under USD 190. It uses a Warmup-Stable-Decay (WSD) learning-rate schedule with fused AdamW, bf16, and torch.compile. The validation loss decreases throughout the stable phase and drops sharply during the final decay.

Pre-training validation loss Warmup-Stable-Decay schedule

Post-training (this model)

The post-training pipeline — supervised fine-tuning, Direct Preference Optimization (DPO), and GRPO — was implemented from scratch in pure PyTorch and run on a single AMD Strix Halo iGPU (ROCm, gfx1151), without TRL or bitsandbytes.

  • Supervised fine-tuning on smol-smoltalk with prompt masking. At 306M parameters, multi-epoch SFT induces catastrophic forgetting of base knowledge; training is stopped at approximately 0.5 epochs, at the point that balances instruction-following against retained general knowledge.

Catastrophic forgetting trade-off

  • Preference optimization. On-policy DPO preserved benchmark accuracy but did not improve held-out generation quality, because at this scale self-sampled candidates carry a weak preference signal. The objective was therefore changed to GRPO with verifiable, rule-based rewards (programmatically checkable instructions), which targets a capability the model can reliably improve. Constraint-following pass-rate rises smoothly during training while the KL divergence from the reference policy stays bounded.

Evaluation

Base model: data efficiency

All models below were re-run through one identical lm-evaluation-harness configuration (0-shot), so the comparison is internally consistent; these figures therefore differ slightly from each model's published numbers.

Capability versus pre-training token budget

Metric (0-shot) Helios-306M (50B tok) SmolLM2-360M (~4T) Qwen2.5-0.5B (~18T)
Winogrande 57.2 57.9 56.3
PIQA 68.1 72.6 70.6
OpenBookQA 34.4 37.6 35.4
HellaSwag 44.7 52.5 49.5
ARC (avg) 42.8 53.4 45.5
MMLU 24.3 25.3 47.6
Commonsense reasoning (Winogrande + PIQA) 62.65 65.25 63.45

Helios reaches 96.0% of SmolLM2-360M on commonsense reasoning (Winogrande + PIQA) at roughly 80× less pre-training data, and ties it on Winogrande (99%). On MMLU the two models are within 96% of each other (24.3 versus 25.3); at this scale both sit near the 25% random-chance floor on MMLU, so this indicates parity rather than mastery. The model trails on tasks bounded by data volume — broad factual recall (TriviaQA) and exam-style knowledge, where Qwen2.5-0.5B's much larger curated corpus is decisive. Helios Nova is data-efficient, not knowledge-rich.

Full benchmark sweep

Post-training: SFT to GRPO

Each checkpoint was evaluated on the same seeded harness across three axes: capability retention, constraint-following pass-rate, and pairwise generation win-rate.

Stage Capabilities (avg MC) Constraint-following Win-rate vs SFT
SFT (baseline) 0.371 39.1% —
GRPO (this model) 0.371 57.4% (+18.3 pp) 52.7% (no regression)

SFT versus GRPO GRPO constraint-following during training

Intended use and limitations

Helios Nova 306M-Instruct-2606 is suitable for general conversation, instruction following, commonsense reasoning, format- and constraint-following, and on-device or CPU inference. It is a strong base for further fine-tuning, quantization, and compression research.

It is not suitable as a source of factual knowledge. A 306M-parameter model trained on 50B tokens of educational text has limited world knowledge, and performs near chance on broad factual recall (TriviaQA) and exam-style benchmarks (MMLU). Outputs may be inaccurate or outdated and should be verified before use; the model is not appropriate for high-stakes decisions. The model is English-only.

The Helios Nova family

Model Description
Helios-Nova-306M From-scratch base model (50B tokens)
Helios-Nova-306M-Instruct Original SFT instruction model (PyTorch)
Helios-Nova-306M-Instruct-GGUF GGUF build of the SFT instruction model
Helios-Nova-306M-Instruct-2606 (this model) GRPO-aligned instruction model; GGUF and safetensors

Citation

@misc{espinosamena2026heliosnova2606,
  title  = {Helios Nova 306M-Instruct-2606: data-efficient pre-training and verifiable-reward GRPO on a single iGPU},
  author = {Espinosa Mena, Rafael},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/respinosamena/Helios-Nova-306M-Instruct-2606}}
}

Contact

Rafael Espinosa Mena — rafaelespinosamena@gmail.com

License

Released under the Apache-2.0 license. Copyright 2026 Rafael Espinosa Mena.

Downloads last month
160
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for respinosamena/Helios-Nova-306M-Instruct-2606

Quantized
(1)
this model

Dataset used to train respinosamena/Helios-Nova-306M-Instruct-2606