Instructions to use Orionfold/spark-hermes-vertical-router with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- HERMES
How to use Orionfold/spark-hermes-vertical-router with HERMES:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Spark Hermes Vertical Router β 5 specialists + 1 default brain
A deterministic keyword-classifier router for the NVIDIA DGX Spark (GB10, 128 GB unified memory): dispatch each Hermes prompt to one of five Orionfold vertical GGUFs β patent / legal / finance / cyber / medical β served one-at-a-time, with a strong default brain (Qwen3-30B-A3B MoE Q4_K_M) catching everything else.
What this harness is
One always-on brain, five specialists, zero LLM-classifier overhead.
A Spark holds one strong model warm at a time. The pinned MoE is excellent at general agentic work but is not your domain expert. The five Orionfold vertical GGUFs are domain experts but compete for the same 128 GB envelope. A router picks per prompt: keyword-matched prompts get the right specialist (warm on demand, ~5β10 s), everything else stays with the brain.
Good for:
- Route a Hermes agent prompt to a vertical specialist by keyword
- Reproduce the 30-prompt router-accuracy + per-vertical quality bench
- Embed a deterministic, auditable router into a Hermes config
For: DGX Spark power users running a local, no-API-key agent harness across multiple domains.
Serving lanes
| Lane | Provider | Model | tok/s | Sustained (min) | Format-error | Clean-run |
|---|---|---|---|---|---|---|
| Patent prosecution | llama-server | Orionfold/patent-strategist-v3-nemo-GGUF Q5_K_M | β | β | β | 80.0% |
| Legal reasoning | llama-server | Orionfold/Saul-7B-Instruct-v1-GGUF Q5_K_M | β | β | β | 80.0% |
| Financial analysis | llama-server | Orionfold/finance-chat-GGUF Q5_K_M | β | β | β | 100.0% |
| Defensive cyber | llama-server | Orionfold/SecurityLLM-GGUF Q5_K_M | β | β | β | 100.0% |
| Clinical reasoning | llama-server | Orionfold/II-Medical-8B-GGUF Q5_K_M | β | β | β | 100.0% |
| Default brain (MoE) β | llama-server | Qwen/Qwen3-30B-A3B-Q4_K_M | β | β | β | 80.0% |
Tool-call format-error rate is the agent-critical number: a lane that can't emit well-formed tool calls is disqualified regardless of speed.
Configuration
~/.hermes/config.yaml (model block):
model:
provider: custom
base_url: "http://127.0.0.1:8080/v1"
default: Qwen3-30B-A3B-Q4_K_M.gguf
~/.hermes/.env:
HERMES_STREAM_READ_TIMEOUT=1800
OPENAI_API_KEY=local
OPENAI_BASE_URL=http://127.0.0.1:8080/v1
router.yaml:
router:
kind: vertical
default:
name: brain
hf_repo: Qwen/Qwen3-30B-A3B-Q4_K_M
variant: Q4_K_M
description: "Qwen3-30B-A3B MoE Q4_K_M β the Step-2 pinned default brain (8/8 quality, 83.5 tok/s, 31.8 GB; the always-warm fallback that picks up any prompt no vertical claims). Always served on 127.0.0.1:8080 via llama.cpp."
routes:
- name: patent
hf_repo: Orionfold/patent-strategist-v3-nemo-GGUF
variant: Q5_K_M
keywords:
- patent
- claim
- prior art
- uspto
- mpep
- prosecution
- Β§102
- Β§103
- provisional
- patentability
- patentable
- infring
description: "Offline patent-prosecution reasoning (claims, prosecution, MPEP)."
- name: legal
hf_repo: Orionfold/Saul-7B-Instruct-v1-GGUF
variant: Q5_K_M
keywords:
- lawsuit
- contract
- tort
- statute
- plaintiff
- defendant
- breach
- negligence
- estoppel
- doctrine
- res ipsa
- limitations
- repose
- promissory
- sue
description: "Legal reasoning over contracts, torts, and statutes (Saul-7B-Instruct)."
- name: finance
hf_repo: Orionfold/finance-chat-GGUF
variant: Q5_K_M
keywords:
- portfolio
- 10-k
- ebitda
- dividend
- fed
- yield curve
- sharpe
- moat
- valuation
- earnings
- p/e
- stock
- shareholder
- balance sheet
- cash flow
description: "Financial analysis, 10-K reading, valuation primitives."
- name: cyber
hf_repo: Orionfold/SecurityLLM-GGUF
variant: Q5_K_M
keywords:
- cve
- exploit
- malware
- rce
- owasp
- siem
- vulnerab
- phishing
- ransomware
- lpe
- privilege escalation
- infection
- intrusion
- ddos
description: "Defensive cybersecurity: CVE triage, OWASP, incident response."
- name: medical
hf_repo: Orionfold/II-Medical-8B-GGUF
variant: Q5_K_M
keywords:
- symptom
- diagnosis
- icd-10
- pathology
- dose
- mg/kg
- amoxicillin
- d-dimer
- differential
- infarct
- ed patient
- embolism
- thromb
- presents with
- mg/dl
- myocardial
description: "Clinical reasoning: differentials, dosing, ICD-10, ED workups."
Doctor checklist
- Router accuracy β₯ 100% on the 30-prompt bench
- Overall vertical pass-rate β₯ 90%
- Default brain warm on :8080 (always-on)
- Vertical port :8090 free between verticals
- All 5 vertical GGUFs cached at /home/nvidia/data/quants/
Methods
Measured and documented in The Hermes vertical router on a DGX Spark.
Known drift
- Router-accuracy sample size β router classification measured over 30 prompts (5 per vertical + 5 default-brain) β not a large-N guarantee.
- Keyword-set tuning β vertical keywords were tuned against the bench prompts; out-of-distribution prompts may misroute.
- Per-vertical pass-rate basis β 5 prompts per vertical; deterministic substring/regex rubrics β open-ended answers (haiku, drafted claims) marked vibe.
- One-at-a-time vertical serving β verticals are served on demand on :8090 (~5β10s warm); the default brain stays warm on :8080 (always-on, ~32 GB).
Published by Orionfold LLC Β· orionfold.com Β· Methods documented at ainative.business/field-notes.