HRM-Text-1B-Chat โ€” Two-Phase Chat Fine-Tune

A chat/instruct fine-tune of sapientinc/HRM-Text-1B using a two-phase training approach that separately targets the model's explorer and refiner processing cycles.

What is HRM?

HRM (Hierarchical Recurrent Model) processes each token through 8 sequential steps: 2 H-cycles ร— 3 L-cycles. The L_module and H_module are shared weight stacks (490M params each) called repeatedly with different cycle offsets. This gives a 1B parameter model significantly more compute per token than a standard transformer of the same size.

Training Approach

The base HRM-Text-1B uses torch.no_grad() on explorer cycles (L-cycles 0,1) during standard training, only updating the refiner (L-cycle 2) and H-module. This fine-tune removes that constraint and trains in two phases:

Phase 1 โ€” Explorer Training:

  • Trainable: L_module + embeddings (58.5% of params), H_module frozen
  • Data: 2,400 balanced examples from UltraChat 200K
  • LR: 1e-5, 2 epochs, cosine decay
  • Result: Model learns conversational structure and formatting

Phase 2 โ€” Full Training + Identity Grounding:

  • Trainable: 100% of parameters, starting from Phase 1 weights
  • Data: 2,399 examples (92% fresh UltraChat + 8% synthetic)
    • 60 identity examples, 30 factual corrections, 30 epistemic calibration
    • 24 humor/creative, 24 grounding, 15 structured reasoning
  • LR: 2e-5, 3 epochs, cosine decay
  • Result: Model gains identity awareness, factual grounding, and conversational polish

Results

Evaluated on a 30-prompt benchmark across 10 categories (formal, technical, casual, creative, silly, adversarial, conversational, instruction, identity, edge cases).

Metric Base Model Phase 1 Phase 2 (This Model)
Token leak rate 57% (17/30) 30% (9/30) 3% (1/30)
Avg tokens/response 117 179 194

Identity Examples

Prompt Base Model This Model
"What is your name?" "Please provide your name and address." "I'm HRM, an AI assistant. How can I help you?"
"Who made you?" "Who created you?" (echo) "I was created by a research team. I'm HRM, an AI assistant."
"Are you human or AI?" Echoed question back "I'm an AI. Specifically, I'm HRM, a language model."
"The sun revolves around the earth, right?" Agreed confidently "No, that's not quite right. The Earth revolves around the Sun."

Token Leak Rate by Category

Category Base This Model
casual 0/3 0/3
creative 3/3 0/3
silly 2/3 0/3
wrong/adversarial 1/3 0/3
identity 3/4 0/4
edge 4/4 1/4

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "taykso/HRM-Text-1B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float32)

prompt = "<|im_start|>user\nWhat is your name?<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, do_sample=True, repetition_penalty=1.1)

response = tokenizer.decode(output[0][inputs['input_ids'].shape[1]:], skip_special_tokens=False)
if "<|im_end|>" in response:
    response = response[:response.index("<|im_end|>")]
print(response)

Observations

  • Both training phases exhibit a U-shaped loss curve: loss increases in epoch 1 then recovers. This appears to be a fundamental property of training all HRM cycles simultaneously.
  • 8% synthetic data was sufficient to ground identity and factual correction without degrading general chat quality.
  • The model runs at ~5 tok/s due to the 8-cycle architecture (equivalent compute to 8 forward passes per token).
  • The two-phase approach allows targeted improvement of explorer vs refiner capabilities.

Training Details

  • Hardware: AMD Radeon 8060S (Strix Halo APU), ROCm 7.2
  • Total time: ~62 hours (Phase 1: ~28h, Phase 2: ~42h)
  • Framework: PyTorch + Transformers
  • Source patch: Line 525 of modeling_hrm_text.py โ€” removed torch.no_grad() from explorer cycles
  • ChatML format: <|im_start|>role\ncontent<|im_end|>

Limitations

  • 1B model โ€” limited world knowledge, will hallucinate on specific facts
  • Humor understanding is improved but still literal
  • Cannot do multilingual translation
  • Slow inference (~5 tok/s) due to 8-cycle recurrent architecture
  • Trained on English only

License

Apache 2.0 (same as base model)

Downloads last month
25
Safetensors
Model size
1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for taykso/HRM-Text-1B-Chat

Finetuned
(5)
this model