openai/gsm8k
Benchmark • Updated • 17.6k • 931k • 1.37k
How to use theblackdeer/hrm-1b-sft-qlora with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir hrm-1b-sft-qlora theblackdeer/hrm-1b-sft-qlora
QLoRA fine-tuned adapters for Aryagm/HRM-Text-1B-MLX-4bit, a 1B-parameter hierarchical reasoning model with a recurrent architecture (H=2, L=3 = 8 passes per token).
Trained entirely on an 8GB M2 Mac Mini. Part of the Sid Local LLM Benchmark v3.
| Metric | Base Model | Fine-Tuned (v6) | Delta |
|---|---|---|---|
| Overall Weighted Score | 58.3% | 61.7% | +3.4% |
| AGENT (tool calling) | 10% | 60% | +50pp |
| CODE | 70% | 60% | -10pp |
| HALL (hallucination resistance) | 62% | 75% | +13pp |
| INST (instruction following) | 40% | 60% | +20pp |
| CTX (context reasoning) | 75% | 75% | 0 |
adapters.npz — final v6 QLoRA weights (~22MB)best_adapters.npz — best-validation checkpoint (identical to final)| Parameter | Value |
|---|---|
| Base model | Aryagm/HRM-Text-1B-MLX-4bit (4-bit MXFP4) |
| Method | QLoRA (rank=16, alpha=32) |
| Target layers | Attention projections only (gqkv_proj, o_proj) |
| Training samples | 2,000 |
| Iterations | 2,000 |
| Batch size | 1 (gradient accumulation) |
| Learning rate | 2e-5 |
| Optimizer | AdamW |
| Loss | Masked response loss (answer tokens only) |
| Hardware | Apple M2 Mac Mini, 8GB unified memory |
| Source | % | Count |
|---|---|---|
| glaiveai/glaive-function-calling-v2 (AGENT) | 20% | 400 |
| iamtarun/code_instructions_120k_alpaca (CODE) | 30% | 600 |
| yahma/alpaca-cleaned (INST) | 25% | 500 |
| openai/gsm8k (MATH) | 15% | 300 |
| HuggingFaceTB/cosmopedia-100k (REPLAY) | 10% | 200 |
import mlx.core as mx
from mlx_hrm_text.runner import HRMTextGenerator
from mlx_hrm_text.model import HrmTextForCausalLM, set_metal_swiglu
from pathlib import Path
set_metal_swiglu(True)
# Load base model
gen = HRMTextGenerator(
model_dir="Aryagm/HRM-Text-1B-MLX-4bit",
temperature=0.3,
)
# Freeze and apply LoRA
gen.model.freeze()
# Patch attention projections
from mlx.nn import Module
class LoRALinear(Module):
def __init__(self, linear, r=16, alpha=32):
super().__init__()
self.linear = linear
self.linear.freeze()
self.r = r
self.scale = alpha / r
out_f, in_f = linear.weight.shape
self.lora_a = mx.random.normal((in_f, r)) / r
self.lora_b = mx.zeros((r, out_f))
def __call__(self, x):
dtype = x.dtype
return self.linear(x) + (x @ self.lora_a.astype(dtype) @ self.lora_b.astype(dtype)) * self.scale
def apply_lora(module):
for block in module.layers:
block.attn.gqkv_proj = LoRALinear(block.attn.gqkv_proj)
block.attn.o_proj = LoRALinear(block.attn.o_proj)
apply_lora(gen.model.model.H_module)
apply_lora(gen.model.model.L_module)
# Load adapters
flat = mx.load("adapters.npz")
# (Full recursive population in run_hrm_lora_bench.py on GitHub)
result = gen.generate("Write a Python function to reverse a string.")
print(result.text)
@misc{reddeer2026hrm,
author = {the_red_deer},
title = {The HRM Fine-Tuning Journey},
year = {2026},
url = {https://reddeerinv.com/ai/hrm-fine-tuning-journey/}
}