Instructions to use saaheerpurav/amr-steward-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use saaheerpurav/amr-steward-model with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B") model = PeftModel.from_pretrained(base_model, "saaheerpurav/amr-steward-model") - Transformers
How to use saaheerpurav/amr-steward-model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="saaheerpurav/amr-steward-model") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("saaheerpurav/amr-steward-model", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use saaheerpurav/amr-steward-model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "saaheerpurav/amr-steward-model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saaheerpurav/amr-steward-model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/saaheerpurav/amr-steward-model
- SGLang
How to use saaheerpurav/amr-steward-model with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "saaheerpurav/amr-steward-model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saaheerpurav/amr-steward-model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "saaheerpurav/amr-steward-model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saaheerpurav/amr-steward-model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use saaheerpurav/amr-steward-model with Docker Model Runner:
docker model run hf.co/saaheerpurav/amr-steward-model
AMR-Steward — Antibiotic Prescribing Agent
Qwen3-4B + LoRA trained with multi-head GRPO to prescribe the correct antibiotic for drug-resistant bacterial infections. Reward is fully verifiable: seven pure-function components against EUCAST v16.0 breakpoints and IDSA 2022/2023 clinical guidelines. No LLM-as-judge anywhere.
Trained inside the AMR-Steward OpenEnv environment — built for the Meta PyTorch OpenEnv Hackathon, April 2026.
| Base model | Qwen/Qwen3-4B |
| Fine-tuning | LoRA (r=16, α=32, targets: q/k/v/o projections) |
| Algorithm | Multi-head GRPO (TRL + Unsloth, bf16) |
| Hardware | A10G GPU — HuggingFace Spaces |
| Live demo | divyanshb06-amrsteward.hf.space/demo |
| Environment | github.com/saaheerpurav/amr-steward |
| Writeup | BLOG.md |
Training Results
Three curriculum stages — susceptible organisms → MDR + severe renal failure + allergy constraints:
| Stage | Organisms | Budget | Steps | Start → Final | Peak | Mean |
|---|---|---|---|---|---|---|
| 1 — Susceptible | K. pneumoniae, E. coli, S. aureus (susceptible) | 5 tools | 128 | 0.54 → 0.90 | 0.923 | 0.840 |
| 2 — Resistant/MDR | + ESBL, MRSA, VRE | 4 tools | 64 | 0.86 → 0.84 | 0.840 | 0.790 |
| 3 — MDR + Renal + Allergies | + CRE, XDR Pseudomonas, VISA | 3 tools | 32 | 0.81 → 0.88 | 0.988 | 0.707 |
Random baseline: ~0.07. Trained model: 12× better on Stage 1, 10× better on Stage 3.
Reward holds above 0.70 even at Stage 3 — MDR organisms, CrCl 8, penicillin allergy, 3-tool budget.
What This Model Does
The agent receives a clinical patient case and must investigate, then prescribe:
Patient: 67F, ICU, K. pneumoniae bacteremia, meropenem MIC=8.0, CrCl=35, no allergies
Agent investigates:
→ interpret_resistance("meropenem") → "MIC 8.0 → EUCAST: Resistant"
→ check_guideline("bacteremia") → "IDSA: CRE K. pneumoniae → ceftazidime-avibactam"
→ assess_patient_factors() → "CrCl 35: reduce to 1.25g IV q8h"
Agent prescribes:
→ ceftazidime-avibactam 1.25g IV q8h, 14 days
→ reward: 0.92
Without training (broad-empiric): prescribes meropenem → reward ~0.11 (resistant organism, drug has zero effect).
JEPA World Model
The training environment includes a JEPA (Joint Embedding Predictive Architecture) world model — the first application of Meta AI's I-JEPA pattern (Assran et al., CVPR 2023) inside a clinical RL environment.
The world model (≈50K params) predicts in latent space how each tool call would change the agent's known clinical state. It uses an EMA-stabilised target encoder (τ=0.99) — the critical anti-collapse mechanism from the original I-JEPA:
context_encoder(s_before) + tool → predictor → pred_repr
target_encoder(s_after) → tgt_repr [EMA, stop-gradient]
Loss = MSE(pred_repr, tgt_repr)
Three training signals from JEPA: observation hints (ranked tool suggestions), JEPA-weighted reward shaping (0.5×–1.5× bonus multiplier), latent consistency bonus.
Reward Design
All components are pure functions — deterministic, RLVR-verifiable, zero subjectivity:
| Component | What it measures | Range |
|---|---|---|
| R0 Allergy gate | Prescribing an allergen → total = 0.0, episode ends | {0, 1} |
| R1 Microbiologic activity | EUCAST MIC classification vs prescribed drug | {0, 1} |
| R2 Guideline concordance | IDSA first-line=1.0, alternative=0.5, other=0.0 | {0, 0.5, 1} |
| R3 Stewardship (gated on R1) | Narrowest active spectrum; zero if drug doesn't work | [0, 1] |
| R4 Dose correctness | Matches renal-tier adjusted dose | [0, 1] |
| R5 Tool efficiency | (unique tool types / budget spent) × (remaining / total) | [0, 1] |
| R6 Format | Clean single COMMIT line | [0, 1] |
Quality ratio (RLVR oracle):
process_score = 0.40·R1 + 0.25·R2 + 0.15·R3 + 0.10·R4
opt_score = compute_optimal_prescription(patient) # brute-force over antibiogram
quality_ratio = min(1.0, process_score / opt_score) # 1.0 iff agent found optimal drug
total = 0.90·quality_ratio + 0.10·R5
Multi-head GRPO: three independent reward functions (format R6, process R5, terminal quality_ratio) give the trainer separate gradient channels at three timescales — fast format feedback, per-step investigation signal, sparse terminal quality.
Validation
Published Clinical Cases — 3/3 match expert recommendations
| Case | Citation | Expert Prescription | Quality |
|---|---|---|---|
| CRE bacteremia, post-renal-transplant | Tamma et al. Clin Infect Dis. 2023 | Ceftazidime-avibactam 1.25g IV q8h | 1.000 |
| MSSA bacteremia | Maraolo et al. Open Forum Infect Dis. 2018 | Cefazolin 2g IV q8h | 1.000 |
| VRE on hemodialysis | Britt et al. Clin Infect Dis. 2015 | Daptomycin 8mg/kg post-HD | 0.939 |
Adversarial Stress Test — 10/10 pass
| Policy | Pass rate (quality_ratio ≥ 0.85) |
|---|---|
| Broad-empiric (always meropenem) | 0 / 10 |
| Random (seed=42) | 2 / 10 |
| EUCAST-only (no IDSA) | 7 / 10 |
| Trained model | 10 / 10 |
Usage
This is a PEFT LoRA adapter — load on top of Qwen3-4B:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B")
model = PeftModel.from_pretrained(base_model, "saaheerpurav/amr-steward-model")
tokenizer = AutoTokenizer.from_pretrained("saaheerpurav/amr-steward-model")
To use inside the AMR-Steward environment (recommended):
git clone https://github.com/saaheerpurav/amr-steward
pip install -r requirements.txt
uvicorn app:app --port 7860
# Then POST /reset + POST /step via the REST API
Try the live demo: divyanshb06-amrsteward.hf.space/demo
Scope and Limitations
- Not approved for clinical use. Research artefact only.
- Covers the five WHO critical-priority pathogens: K. pneumoniae, E. coli, P. aeruginosa, S. aureus, Enterococcus spp.
- Single-organism, single-drug episodes — no polymicrobial cases or combination therapy.
- Trained on synthetic patient cases, not real EHR data.
- Vancomycin dosing is renal-tier-based, not AUC/MIC-guided therapeutic drug monitoring.
Training Infrastructure
| GPU | NVIDIA A10G (24 GB) via HuggingFace Spaces |
| Precision | bf16 |
| LoRA rank | r=16, α=32 |
| GRPO generations | 4 per step |
| Max completion length | 768 tokens |
| Stage 1 steps | 128 |
| Stage 2 steps | 64 |
| Stage 3 steps | 32 |
| Framework | TRL 0.17+ · Unsloth · HuggingFace Transformers |
Citation
@misc{amr-steward-2026,
title = {AMR-Steward: RLVR Training Environment for Clinical Antimicrobial Stewardship},
author = {Saaheer Purav and Divyansh Bhatia and Palak},
year = {2026},
url = {https://github.com/saaheerpurav/amr-steward}
}
Built at Meta PyTorch OpenEnv Hackathon India, April 2026. Not approved for clinical use.
- Downloads last month
- 171

