Instructions to use jasoncarreira/hrm-text-agent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jasoncarreira/hrm-text-agent with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jasoncarreira/hrm-text-agent")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("jasoncarreira/hrm-text-agent") model = AutoModelForCausalLM.from_pretrained("jasoncarreira/hrm-text-agent") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use jasoncarreira/hrm-text-agent with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jasoncarreira/hrm-text-agent" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-agent", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jasoncarreira/hrm-text-agent
- SGLang
How to use jasoncarreira/hrm-text-agent with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jasoncarreira/hrm-text-agent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-agent", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jasoncarreira/hrm-text-agent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-agent", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jasoncarreira/hrm-text-agent with Docker Model Runner:
docker model run hf.co/jasoncarreira/hrm-text-agent
HRM-Text-1B-agent (v1) β tool / function calling
Full-parameter SFT of sapientinc/HRM-Text-1B β a 1B
base (pre-alignment) Hierarchical Reasoning Model β fine-tuned to do function / tool calling.
It takes a model that scored 0% on the task and turns it into a competent small tool-caller.
Code, full writeup, and the architecture experiments: https://github.com/jasoncarreira/hrm-text-agent
See also hrm-text-agent-v2, which adds xLAM parallel data for much stronger multi-call performance (with some tradeoffs).
Scores β BFCL v4 (official AST checker, full test sets)
| Category | n | Base | This model (v1) |
|---|---|---|---|
| simple | 400 | 0% | 61.5% |
| multiple | 200 | 0% | 53.5% |
| parallel | 200 | 0% | 37.5% |
| parallel_multiple | 200 | 0% | 28.0% |
| irrelevance | 240 | 100% | 80.8% |
Overall micro-average (1,240): 54.7%. Call-category competence (4 call cats, count-weighted): ~48%. This sits above generic 1B instruct models on BFCL non-live AST, and below the best purpose-built 1B (xLAM-2-1b-fc-r ~69) β strong for a 1B base + SFT.
General capability (base β tuned, 8-benchmark forgetting check)
The SFT was benign: reasoning/knowledge retained β GSM8k 84.5β85.6, DROP(F1) 84.8β83.3, and BoolQ/HellaSwag/Winogrande flat. The only drops were single-letter MCQ format discipline (MMLU 60.1β55.5, ARC-C 83.5β75.1), driven by a rise in non-letter outputs (~10% "invalid") β largely recoverable (the model often answers correctly in prose), and fixed in v2.
Training
Matches the sapientinc cfg_sft recipe:
- full-parameter SFT β not LoRA (LoRA on HRM's weight-shared recurrence amplifies the delta per-cycle and collapses the output distribution); bf16 autocast + fp32 master weights
- lr 3e-5, cosine decay to 10%, no warmup; AdamW (0.9, 0.95), weight_decay 0.1
- 3 epochs,
max_len2048, effective batch ~32; ~25k mixed examples, ~3.5 h on an A100 80GB - Uses the model's
directcondition token (<|object_ref_start|>) β the documented mode for structured output.
Data mix: tool calls (Hermes + glaive-function-calling) + general instructions (HuggingFaceH4/no_robots) + synthesized irrelevance ("tools present but none fit β don't call").
Usage
HRM-Text is a PrefixLM with a conditioning scheme: prompts open
<|im_start|><|object_ref_start|>β¦<|im_end|> with token_type_ids=1 over the prompt span. A naive
.generate() won't match the training distribution β use the agent loop / eval harness in the repo:
git clone https://github.com/jasoncarreira/hrm-text-agent && cd hrm-text-agent
pip install -r requirements.txt
python infer_agent.py --model jasoncarreira/hrm-text-agent "What's the weather in Paris?"
python bfcl_local.py --model jasoncarreira/hrm-text-agent --dump errs.jsonl # full BFCL
License & data lineage
The base model is Apache-2.0, but the training data includes HuggingFaceH4/no_robots (CC-BY-NC-4.0), so treat this derived model as research / non-commercial. Verify the licenses of all sources for your use case.
π€ Built with Claude Code (including a second Claude driving training on the GPU pod).
- Downloads last month
- 40
Model tree for jasoncarreira/hrm-text-agent
Base model
sapientinc/HRM-Text-1B