HRM-Text-1B-Chat โ Two-Phase Chat Fine-Tune
A chat/instruct fine-tune of sapientinc/HRM-Text-1B using a two-phase training approach that separately targets the model's explorer and refiner processing cycles.
What is HRM?
HRM (Hierarchical Recurrent Model) processes each token through 8 sequential steps: 2 H-cycles ร 3 L-cycles. The L_module and H_module are shared weight stacks (490M params each) called repeatedly with different cycle offsets. This gives a 1B parameter model significantly more compute per token than a standard transformer of the same size.
Training Approach
The base HRM-Text-1B uses torch.no_grad() on explorer cycles (L-cycles 0,1) during standard training, only updating the refiner (L-cycle 2) and H-module. This fine-tune removes that constraint and trains in two phases:
Phase 1 โ Explorer Training:
- Trainable: L_module + embeddings (58.5% of params), H_module frozen
- Data: 2,400 balanced examples from UltraChat 200K
- LR: 1e-5, 2 epochs, cosine decay
- Result: Model learns conversational structure and formatting
Phase 2 โ Full Training + Identity Grounding:
- Trainable: 100% of parameters, starting from Phase 1 weights
- Data: 2,399 examples (92% fresh UltraChat + 8% synthetic)
- 60 identity examples, 30 factual corrections, 30 epistemic calibration
- 24 humor/creative, 24 grounding, 15 structured reasoning
- LR: 2e-5, 3 epochs, cosine decay
- Result: Model gains identity awareness, factual grounding, and conversational polish
Results
Evaluated on a 30-prompt benchmark across 10 categories (formal, technical, casual, creative, silly, adversarial, conversational, instruction, identity, edge cases).
| Metric | Base Model | Phase 1 | Phase 2 (This Model) |
|---|---|---|---|
| Token leak rate | 57% (17/30) | 30% (9/30) | 3% (1/30) |
| Avg tokens/response | 117 | 179 | 194 |
Identity Examples
| Prompt | Base Model | This Model |
|---|---|---|
| "What is your name?" | "Please provide your name and address." | "I'm HRM, an AI assistant. How can I help you?" |
| "Who made you?" | "Who created you?" (echo) | "I was created by a research team. I'm HRM, an AI assistant." |
| "Are you human or AI?" | Echoed question back | "I'm an AI. Specifically, I'm HRM, a language model." |
| "The sun revolves around the earth, right?" | Agreed confidently | "No, that's not quite right. The Earth revolves around the Sun." |
Token Leak Rate by Category
| Category | Base | This Model |
|---|---|---|
| casual | 0/3 | 0/3 |
| creative | 3/3 | 0/3 |
| silly | 2/3 | 0/3 |
| wrong/adversarial | 1/3 | 0/3 |
| identity | 3/4 | 0/4 |
| edge | 4/4 | 1/4 |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "taykso/HRM-Text-1B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float32)
prompt = "<|im_start|>user\nWhat is your name?<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, do_sample=True, repetition_penalty=1.1)
response = tokenizer.decode(output[0][inputs['input_ids'].shape[1]:], skip_special_tokens=False)
if "<|im_end|>" in response:
response = response[:response.index("<|im_end|>")]
print(response)
Observations
- Both training phases exhibit a U-shaped loss curve: loss increases in epoch 1 then recovers. This appears to be a fundamental property of training all HRM cycles simultaneously.
- 8% synthetic data was sufficient to ground identity and factual correction without degrading general chat quality.
- The model runs at ~5 tok/s due to the 8-cycle architecture (equivalent compute to 8 forward passes per token).
- The two-phase approach allows targeted improvement of explorer vs refiner capabilities.
Training Details
- Hardware: AMD Radeon 8060S (Strix Halo APU), ROCm 7.2
- Total time: ~62 hours (Phase 1: ~28h, Phase 2: ~42h)
- Framework: PyTorch + Transformers
- Source patch: Line 525 of modeling_hrm_text.py โ removed
torch.no_grad()from explorer cycles - ChatML format:
<|im_start|>role\ncontent<|im_end|>
Limitations
- 1B model โ limited world knowledge, will hallucinate on specific facts
- Humor understanding is improved but still literal
- Cannot do multilingual translation
- Slow inference (~5 tok/s) due to 8-cycle recurrent architecture
- Trained on English only
License
Apache 2.0 (same as base model)
- Downloads last month
- 25
Model tree for taykso/HRM-Text-1B-Chat
Base model
sapientinc/HRM-Text-1B