Instructions to use jasoncarreira/hrm-text-agent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jasoncarreira/hrm-text-agent with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jasoncarreira/hrm-text-agent")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jasoncarreira/hrm-text-agent")
model = AutoModelForCausalLM.from_pretrained("jasoncarreira/hrm-text-agent")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jasoncarreira/hrm-text-agent with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jasoncarreira/hrm-text-agent"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-agent",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jasoncarreira/hrm-text-agent

SGLang

How to use jasoncarreira/hrm-text-agent with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jasoncarreira/hrm-text-agent" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-agent",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jasoncarreira/hrm-text-agent" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-agent",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use jasoncarreira/hrm-text-agent with Docker Model Runner:
```
docker model run hf.co/jasoncarreira/hrm-text-agent
```

HRM-Text-1B-agent (v1) — tool / function calling

Full-parameter SFT of sapientinc/HRM-Text-1B — a 1B base (pre-alignment) Hierarchical Reasoning Model — fine-tuned to do function / tool calling. It takes a model that scored 0% on the task and turns it into a competent small tool-caller.

Code, full writeup, and the architecture experiments: https://github.com/jasoncarreira/hrm-text-agent

See also hrm-text-agent-v2, which adds xLAM parallel data for much stronger multi-call performance (with some tradeoffs).

Scores — BFCL v4 (official AST checker, full test sets)

Category	n	Base	This model (v1)
simple	400	0%	61.5%
multiple	200	0%	53.5%
parallel	200	0%	37.5%
parallel_multiple	200	0%	28.0%
irrelevance	240	100%	80.8%

Overall micro-average (1,240): 54.7%. Call-category competence (4 call cats, count-weighted): ~48%. This sits above generic 1B instruct models on BFCL non-live AST, and below the best purpose-built 1B (xLAM-2-1b-fc-r ~69) — strong for a 1B base + SFT.

General capability (base → tuned, 8-benchmark forgetting check)

The SFT was benign: reasoning/knowledge retained — GSM8k 84.5→85.6, DROP(F1) 84.8→83.3, and BoolQ/HellaSwag/Winogrande flat. The only drops were single-letter MCQ format discipline (MMLU 60.1→55.5, ARC-C 83.5→75.1), driven by a rise in non-letter outputs (~10% "invalid") — largely recoverable (the model often answers correctly in prose), and fixed in v2.

Training

Matches the sapientinc cfg_sft recipe:

full-parameter SFT — not LoRA (LoRA on HRM's weight-shared recurrence amplifies the delta per-cycle and collapses the output distribution); bf16 autocast + fp32 master weights
lr 3e-5, cosine decay to 10%, no warmup; AdamW (0.9, 0.95), weight_decay 0.1
3 epochs, max_len 2048, effective batch ~32; ~25k mixed examples, ~3.5 h on an A100 80GB
Uses the model's direct condition token (<|object_ref_start|>) — the documented mode for structured output.

Data mix: tool calls (Hermes + glaive-function-calling) + general instructions (HuggingFaceH4/no_robots) + synthesized irrelevance ("tools present but none fit → don't call").

Usage

git clone https://github.com/jasoncarreira/hrm-text-agent && cd hrm-text-agent
pip install -r requirements.txt
python infer_agent.py --model jasoncarreira/hrm-text-agent "What's the weather in Paris?"
python bfcl_local.py --model jasoncarreira/hrm-text-agent --dump errs.jsonl   # full BFCL

License & data lineage

The base model is Apache-2.0, but the training data includes HuggingFaceH4/no_robots (CC-BY-NC-4.0), so treat this derived model as research / non-commercial. Verify the licenses of all sources for your use case.

🤖 Built with Claude Code (including a second Claude driving training on the GPU pod).

Downloads last month: 40

Safetensors

Model size

1B params

Tensor type

F32

Model tree for jasoncarreira/hrm-text-agent

Base model

sapientinc/HRM-Text-1B

Finetuned

(9)

this model