Instructions to use viventhraa96/HRM-Embed-0.6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use viventhraa96/HRM-Embed-0.6b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="viventhraa96/HRM-Embed-0.6b", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("viventhraa96/HRM-Embed-0.6b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("viventhraa96/HRM-Embed-0.6b", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
HRM-Embed-0.6b
A compact text-embedding model built on the Hierarchical Reasoning Model (HRM), a
depth-recurrent architecture from Sapient Intelligence. It applies a standard
embedding recipe to that unusual backbone: ~0.6B parameters, fine-tuned end-to-end (contrastive)
from the open Xiaoye08/HRM-Text-0.6B base checkpoint.
This is an embedding model, not a generator. It exposes 1280-dim sentence embeddings via a mean-pool of the recurrence state (see Usage). A plain
from_pretrainedgives a causal LM; you must apply the embedding recipe below.
Requirements
transformers(loads custom architecture viatrust_remote_code=True)torch(bfloat16; runs on CPU or GPU)- The model does not load via
sentence-transformers.
Model details
| Architecture | Hierarchical Reasoning Model (depth-recurrent), HrmTextForCausalLM |
| Parameters | ~610.8M (dense; the untrained LM head is not shipped) |
| Embedding dim | 1280 (L2-normalized) |
| Hidden size | 1280 |
| Layers | 12 per stack × 2 stacks (H + L) = 24 blocks |
| Attention heads | 10 (head_dim 128) |
| Recurrence (H, L cycles) | 2, 3 (8 stack-passes per forward) |
| Context length | 4096 |
| Vocab | 65,536 (GPT-2-style BPE) |
| Attention | Prefix-LM; bidirectional when token_type_ids = attention_mask |
| Dtype | bfloat16 |
| License | Apache-2.0 |
Usage
Embeddings are the L2-normalized mean-pool of the final recurrence hidden state (z_h).
Bidirectional encoding is obtained by passing token_type_ids = attention_mask (marks the whole
input as one bidirectional prefix). Replace the LM head with Identity so nothing downstream of the
recurrence state is used.
import torch, torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM
name = "viventhraa96/HRM-Embed-0.6b"
dev = "cuda" if torch.cuda.is_available() else "cpu"
tok = AutoTokenizer.from_pretrained(name)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
model = AutoModelForCausalLM.from_pretrained(
name, trust_remote_code=True, torch_dtype=torch.bfloat16).to(dev).eval()
model.lm_head = torch.nn.Identity() # embeddings come from z_h, not the LM head
@torch.no_grad()
def embed(texts, max_length=512):
tok.padding_side = "right"
e = tok(texts, truncation=True, max_length=max_length, padding=True, return_tensors="pt").to(dev)
pos = torch.arange(e.input_ids.shape[1], device=dev).unsqueeze(0).expand(e.input_ids.shape[0], -1)
z, _ = model.model(e.input_ids, position_ids=pos, use_cache=False, token_type_ids=e.attention_mask)
m = e.attention_mask.unsqueeze(-1).to(z.dtype) # mean-pool over real tokens
return F.normalize(((z * m).sum(1) / m.sum(1).clamp_min(1)).float(), p=2, dim=-1) # [N, 1280]
emb = embed(["How do I sort a list in Python?",
"The mitochondria is the powerhouse of the cell."])
print(emb.shape) # torch.Size([2, 1280])
print(float(emb[0] @ emb[1])) # cosine similarity
Bidirectional encoding (the prefix mask)
HRM-Text is a Prefix-LM. Passing token_type_ids = attention_mask marks every real token as part of
one bidirectional prefix, so tokens attend both ways (padding is excluded and dropped by the masked
mean). This matches how the model was trained as an embedder.
Method (standard recipe)
Nothing here is a new technique; it is an amalgamation of standard ones on an unusual backbone.
- Mean-pool the final hidden state, then L2-normalize: the Sentence-BERT convention, also used by E5 / GTE / BGE.
- Bidirectional attention instead of causal: the conversion popularized by
LLM2Vec for turning decoder LMs into encoders. Here it needs no
mask monkey-patching, since HRM-Text is natively a Prefix-LM, so
token_type_ids = attention_maskenables it the intended way. - Contrastive (InfoNCE) fine-tuning to produce the weights: the standard training objective for modern text embedders.
Because the model runs bidirectionally, mean-pooling (rather than last-token pooling, common for
causal decoders) is the natural, coherent choice. The only unusual part is the backbone: applying
this recipe to a depth-recurrent HRM and pooling the recurrence state z_h.
Results: BRIGHT (reasoning retrieval)
Mean nDCG@10 over BRIGHT's 12 domains, for three query modes: raw (original query), rewrite (an LLM rewrites the query first, as most top BRIGHT systems do), and merged (raw + rewrite).
| Query mode | Mean nDCG@10 |
|---|---|
| raw (bare embedder) | 18.1 |
| + query rewriting | 34.3 |
| merged (raw + rewrite) | 33.7 |
Per-domain (nDCG@10 x100):
| Domain | raw | rewrite |
|---|---|---|
| theoremqa_theorems | 29.4 | 50.4 |
| pony | 1.1 | 46.5 |
| biology | 20.4 | 45.6 |
| theoremqa_questions | 29.8 | 44.1 |
| psychology | 21.3 | 39.3 |
| economics | 17.5 | 35.7 |
| sustainable_living | 13.6 | 30.9 |
| earth_science | 20.2 | 30.9 |
| aops | 16.3 | 27.5 |
| stackoverflow | 12.5 | 27.3 |
| robotics | 12.3 | 20.2 |
| leetcode | 22.6 | 12.9 |
Where it's strong: theorem/definition/reference lookup (theoremqa) and vocabulary-aligned
scientific QA (biology, psychology). Where it's weak: reasoning-transfer retrieval (match by
shared technique, not shared words, e.g. aops), community/procedural QA (robotics, stackoverflow).
pony is the extreme case: near-chance without rewriting (raw 1.1) yet among the strongest with it
(46.5), making it the most rewrite-dependent domain in the set.
Note on code retrieval: LeetCode is the one domain where query rewriting hurts (22.6 to 12.9): expanding a terse problem statement into prose moves the query off the corpus distribution. Use the merged variant for code.
Limitations
- English only. Inherits the base checkpoint's pretraining breadth ceiling; not a broad knowledge embedder.
- Embedder, not a generator: the checkpoint ships without an LM head, so a plain load prints a
lm_head.weightnewly-initialized warning (expected) and.generate()returns noise. Apply theIdentityswap shown in Usage and poolz_h. - Best results use a query-rewriting front-end (an external LLM). The bare-embedder (raw) ceiling is lower; raw and rewrite numbers are both reported above so you can see the real embedder. The rewritten queries here come from INF-X-Retriever.
- Modest absolute scores on BRIGHT: this is a small model on a deliberately adversarial benchmark.
Architecture & credits
The Hierarchical Reasoning Model (HRM) architecture is by Sapient Intelligence
(github.com/sapientinc/HRM, github.com/sapientinc/HRM-Text, arXiv:2506.21734).
All architectural credit is theirs. This model is a text-embedding fine-tune of the open
Xiaoye08/HRM-Text-0.6B pretrained checkpoint (Apache-2.0); the HRM-Text pretraining pipeline is
described in arXiv:2605.20613.
License
Apache-2.0. This is a derivative of Apache-2.0 licensed weights; attribution to Sapient Intelligence (HRM) and the HRM-Text project is preserved above.
Citation
@misc{hrm2025,
title = {Hierarchical Reasoning Model},
author = {Wang, Guan and others},
year = {2025},
eprint = {2506.21734}, archivePrefix = {arXiv}, primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2506.21734}
}
@misc{hrmtext2026,
title = {HRM-Text: Efficient Pretraining Beyond Scaling},
author = {Wang, Guan and Liu, Changling and Wang, Chenyu and Zhou, Cai and Sun, Yuhao and Wu, Yifei and Zhen, Shuai and Scimeca, Luca and Abbasi Yadkori, Yasin},
year = {2026},
eprint = {2605.20613}, archivePrefix = {arXiv}, primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2605.20613}
}
- Downloads last month
- -
Model tree for viventhraa96/HRM-Embed-0.6b
Base model
Xiaoye08/HRM-Text-0.6BPapers for viventhraa96/HRM-Embed-0.6b
Hierarchical Reasoning Model
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Evaluation results
- mean nDCG@10 (raw queries) on BRIGHT (12 domains)self-reported18.100
- mean nDCG@10 (query-rewrite) on BRIGHT (12 domains)self-reported34.300
- mean nDCG@10 (merged raw+rewrite) on BRIGHT (12 domains)self-reported33.700