Instructions to use jasoncarreira/hrm-text-agent-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jasoncarreira/hrm-text-agent-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jasoncarreira/hrm-text-agent-v2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jasoncarreira/hrm-text-agent-v2")
model = AutoModelForCausalLM.from_pretrained("jasoncarreira/hrm-text-agent-v2")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jasoncarreira/hrm-text-agent-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jasoncarreira/hrm-text-agent-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-agent-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jasoncarreira/hrm-text-agent-v2

SGLang

How to use jasoncarreira/hrm-text-agent-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jasoncarreira/hrm-text-agent-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-agent-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jasoncarreira/hrm-text-agent-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-agent-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use jasoncarreira/hrm-text-agent-v2 with Docker Model Runner:
```
docker model run hf.co/jasoncarreira/hrm-text-agent-v2
```

HRM-Text-1B-agent v2 — tool / function calling (xLAM-scaled)

Full-parameter SFT of sapientinc/HRM-Text-1B for function / tool calling. This is v2 of hrm-text-agent: it adds xLAM parallel / multi-call data and a format-discipline slice. The result is a much stronger tool-caller — call competence into best-1B territory — with some tradeoffs (below).

Code + full writeup: https://github.com/jasoncarreira/hrm-text-agent

Scores — BFCL v4 (official AST checker, full test sets)

Category	n	v1	v2	Δ
simple	400	61.5%	81.5%	+20.0
multiple	200	53.5%	77.0%	+23.5
parallel	200	37.5%	59.0%	+21.5
parallel_multiple	200	28.0%	42.5%	+14.5
irrelevance	240	80.8%	60.8%	−20.0

Call-category competence (4 call cats, count-weighted): ~48% → 68.3% — into the purpose-built-1B range (xLAM-2-1b-fc-r ~69). Overall micro-average (1,240): 54.7% → 66.8%.

Tradeoff — irrelevance −20: the xLAM data is all-call (no "don't-call" cases), so the model became more eager to call when no tool actually fits.

General capability (base → v1 → v2)

Benchmark	base	v1	v2	note
MMLU	60.1%	55.5%	58.4%	invalid 11.9% → 1.4% (format recovered)
ARC-C	83.5%	75.1%	83.2%	invalid 9.9% → 0% (back to base)
HellaSwag	63.3%	61.9%	61.9%	stable
Winogrande	72.2%	70.6%	70.7%	stable
BoolQ	86.3%	87.3%	86.3%	stable
DROP (F1)	84.8%	83.3%	83.7%	stable
GSM8k	84.5%	85.6%	78.6%	−7 vs v1 (real reasoning, invalid 0%)
MATH-1000	49.3%	45.4%	37.0%	−8 vs v1 (accuracy, not format)

The format-discipline slice recovered the v1 MCQ regression (ARC fully back to base; MMLU's invalid rate collapsed). But the call-heavy mix introduced a new free-form-math regression (GSM8k −7, MATH −8 vs v1 — not a format artifact). Net: stronger calls + cured MCQ format, at the cost of irrelevance discipline and free-form math. v3 lever: rebalance the mix (more no-call data, protect the reasoning share).

Training

Same cfg_sft recipe as v1 (full-parameter, lr 3e-5, cosine, 3 epochs, max_len 2048, bf16, direct condition token). Data: the v1 mix (Hermes + glaive + no_robots + synthesized irrelevance) + ~14k parallel-biased Salesforce/xlam-function-calling-60k + ~3k format-discipline examples (single-letter MCQ + \boxed{} math, from train/aux splits — leakage-safe), all interleaved. ~3 epochs on an A100 80GB.

Usage

Same as v1 — HRM-Text is a PrefixLM that needs the direct condition envelope, so use the repo harness rather than a bare .generate():

git clone https://github.com/jasoncarreira/hrm-text-agent && cd hrm-text-agent
pip install -r requirements.txt
python infer_agent.py --model jasoncarreira/hrm-text-agent-v2 "Book a table for 2 and check the weather"
python bfcl_local.py --model jasoncarreira/hrm-text-agent-v2 --dump errs.jsonl

License & data lineage

The base model is Apache-2.0, but the training data includes no_robots (CC-BY-NC-4.0) and xLAM-60k (gated, CC-BY-4.0), so treat this derived model as research / non-commercial. Verify the licenses of all sources for your use case.

🤖 Built with Claude Code (including a second Claude driving training on the GPU pod).

Downloads last month: 43

Safetensors

Model size

1B params

Tensor type

F32

Model tree for jasoncarreira/hrm-text-agent-v2

Base model

sapientinc/HRM-Text-1B

Finetuned

(9)

this model