Instructions to use jasoncarreira/hrm-text-agent-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jasoncarreira/hrm-text-agent-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jasoncarreira/hrm-text-agent-v2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("jasoncarreira/hrm-text-agent-v2") model = AutoModelForCausalLM.from_pretrained("jasoncarreira/hrm-text-agent-v2") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use jasoncarreira/hrm-text-agent-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jasoncarreira/hrm-text-agent-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-agent-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jasoncarreira/hrm-text-agent-v2
- SGLang
How to use jasoncarreira/hrm-text-agent-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jasoncarreira/hrm-text-agent-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-agent-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jasoncarreira/hrm-text-agent-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-agent-v2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jasoncarreira/hrm-text-agent-v2 with Docker Model Runner:
docker model run hf.co/jasoncarreira/hrm-text-agent-v2
HRM-Text-1B-agent v2 β tool / function calling (xLAM-scaled)
Full-parameter SFT of sapientinc/HRM-Text-1B for
function / tool calling. This is v2 of
hrm-text-agent: it adds xLAM parallel /
multi-call data and a format-discipline slice. The result is a much stronger tool-caller β
call competence into best-1B territory β with some tradeoffs (below).
Code + full writeup: https://github.com/jasoncarreira/hrm-text-agent
Scores β BFCL v4 (official AST checker, full test sets)
| Category | n | v1 | v2 | Ξ |
|---|---|---|---|---|
| simple | 400 | 61.5% | 81.5% | +20.0 |
| multiple | 200 | 53.5% | 77.0% | +23.5 |
| parallel | 200 | 37.5% | 59.0% | +21.5 |
| parallel_multiple | 200 | 28.0% | 42.5% | +14.5 |
| irrelevance | 240 | 80.8% | 60.8% | β20.0 |
Call-category competence (4 call cats, count-weighted): ~48% β 68.3% β into the purpose-built-1B range (xLAM-2-1b-fc-r ~69). Overall micro-average (1,240): 54.7% β 66.8%.
Tradeoff β irrelevance β20: the xLAM data is all-call (no "don't-call" cases), so the model became more eager to call when no tool actually fits.
General capability (base β v1 β v2)
| Benchmark | base | v1 | v2 | note |
|---|---|---|---|---|
| MMLU | 60.1% | 55.5% | 58.4% | invalid 11.9% β 1.4% (format recovered) |
| ARC-C | 83.5% | 75.1% | 83.2% | invalid 9.9% β 0% (back to base) |
| HellaSwag | 63.3% | 61.9% | 61.9% | stable |
| Winogrande | 72.2% | 70.6% | 70.7% | stable |
| BoolQ | 86.3% | 87.3% | 86.3% | stable |
| DROP (F1) | 84.8% | 83.3% | 83.7% | stable |
| GSM8k | 84.5% | 85.6% | 78.6% | β7 vs v1 (real reasoning, invalid 0%) |
| MATH-1000 | 49.3% | 45.4% | 37.0% | β8 vs v1 (accuracy, not format) |
The format-discipline slice recovered the v1 MCQ regression (ARC fully back to base; MMLU's invalid rate collapsed). But the call-heavy mix introduced a new free-form-math regression (GSM8k β7, MATH β8 vs v1 β not a format artifact). Net: stronger calls + cured MCQ format, at the cost of irrelevance discipline and free-form math. v3 lever: rebalance the mix (more no-call data, protect the reasoning share).
Training
Same cfg_sft recipe as v1 (full-parameter, lr 3e-5, cosine, 3 epochs, max_len 2048, bf16,
direct condition token). Data: the v1 mix (Hermes + glaive + no_robots + synthesized irrelevance)
+ ~14k parallel-biased Salesforce/xlam-function-calling-60k
+ ~3k format-discipline examples (single-letter MCQ + \boxed{} math, from train/aux splits β
leakage-safe), all interleaved. ~3 epochs on an A100 80GB.
Usage
Same as v1 β HRM-Text is a PrefixLM that needs the direct condition envelope, so use the repo
harness rather than a bare .generate():
git clone https://github.com/jasoncarreira/hrm-text-agent && cd hrm-text-agent
pip install -r requirements.txt
python infer_agent.py --model jasoncarreira/hrm-text-agent-v2 "Book a table for 2 and check the weather"
python bfcl_local.py --model jasoncarreira/hrm-text-agent-v2 --dump errs.jsonl
License & data lineage
The base model is Apache-2.0, but the training data includes no_robots (CC-BY-NC-4.0) and xLAM-60k (gated, CC-BY-4.0), so treat this derived model as research / non-commercial. Verify the licenses of all sources for your use case.
π€ Built with Claude Code (including a second Claude driving training on the GPU pod).
- Downloads last month
- 43
Model tree for jasoncarreira/hrm-text-agent-v2
Base model
sapientinc/HRM-Text-1B