HRM-Text-1B-Chat — Two-Phase Chat Fine-Tune

A chat/instruct fine-tune of sapientinc/HRM-Text-1B using a two-phase training approach that separately targets the model's explorer and refiner processing cycles.

What is HRM?

HRM (Hierarchical Recurrent Model) processes each token through 8 sequential steps: 2 H-cycles × 3 L-cycles. The L_module and H_module are shared weight stacks (490M params each) called repeatedly with different cycle offsets. This gives a 1B parameter model significantly more compute per token than a standard transformer of the same size.

Training Approach

The base HRM-Text-1B uses torch.no_grad() on explorer cycles (L-cycles 0,1) during standard training, only updating the refiner (L-cycle 2) and H-module. This fine-tune removes that constraint and trains in two phases:

Phase 1 — Explorer Training:

Trainable: L_module + embeddings (58.5% of params), H_module frozen
Data: 2,400 balanced examples from UltraChat 200K
LR: 1e-5, 2 epochs, cosine decay
Result: Model learns conversational structure and formatting

Phase 2 — Full Training + Identity Grounding:

Trainable: 100% of parameters, starting from Phase 1 weights
Data: 2,399 examples (92% fresh UltraChat + 8% synthetic)
- 60 identity examples, 30 factual corrections, 30 epistemic calibration
- 24 humor/creative, 24 grounding, 15 structured reasoning
LR: 2e-5, 3 epochs, cosine decay
Result: Model gains identity awareness, factual grounding, and conversational polish

Results

Evaluated on a 30-prompt benchmark across 10 categories (formal, technical, casual, creative, silly, adversarial, conversational, instruction, identity, edge cases).

Metric	Base Model	Phase 1	Phase 2 (This Model)
Token leak rate	57% (17/30)	30% (9/30)	3% (1/30)
Avg tokens/response	117	179	194

Identity Examples

Prompt	Base Model	This Model
"What is your name?"	"Please provide your name and address."	"I'm HRM, an AI assistant. How can I help you?"
"Who made you?"	"Who created you?" (echo)	"I was created by a research team. I'm HRM, an AI assistant."
"Are you human or AI?"	Echoed question back	"I'm an AI. Specifically, I'm HRM, a language model."
"The sun revolves around the earth, right?"	Agreed confidently	"No, that's not quite right. The Earth revolves around the Sun."

Token Leak Rate by Category

Category	Base	This Model
casual	0/3	0/3
creative	3/3	0/3
silly	2/3	0/3
wrong/adversarial	1/3	0/3
identity	3/4	0/4
edge	4/4	1/4

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "taykso/HRM-Text-1B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float32)

prompt = "<|im_start|>user\nWhat is your name?<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, do_sample=True, repetition_penalty=1.1)

response = tokenizer.decode(output[0][inputs['input_ids'].shape[1]:], skip_special_tokens=False)
if "<|im_end|>" in response:
    response = response[:response.index("<|im_end|>")]
print(response)

Observations

Both training phases exhibit a U-shaped loss curve: loss increases in epoch 1 then recovers. This appears to be a fundamental property of training all HRM cycles simultaneously.
8% synthetic data was sufficient to ground identity and factual correction without degrading general chat quality.
The model runs at ~5 tok/s due to the 8-cycle architecture (equivalent compute to 8 forward passes per token).
The two-phase approach allows targeted improvement of explorer vs refiner capabilities.

Training Details

Hardware: AMD Radeon 8060S (Strix Halo APU), ROCm 7.2
Total time: ~62 hours (Phase 1: ~28h, Phase 2: ~42h)
Framework: PyTorch + Transformers
Source patch: Line 525 of modeling_hrm_text.py — removed torch.no_grad() from explorer cycles
ChatML format: <|im_start|>role\ncontent<|im_end|>

Limitations

1B model — limited world knowledge, will hallucinate on specific facts
Humor understanding is improved but still literal
Cannot do multilingual translation
Slow inference (~5 tok/s) due to 8-cycle recurrent architecture
Trained on English only

License

Apache 2.0 (same as base model)

Downloads last month: 25

Safetensors

Model size

1B params

Tensor type

F32

Model tree for taykso/HRM-Text-1B-Chat

Base model

sapientinc/HRM-Text-1B

Finetuned

(5)

this model