Instructions to use saaheerpurav/amr-steward-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use saaheerpurav/amr-steward-model with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B")
model = PeftModel.from_pretrained(base_model, "saaheerpurav/amr-steward-model")

Transformers

How to use saaheerpurav/amr-steward-model with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="saaheerpurav/amr-steward-model")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("saaheerpurav/amr-steward-model", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use saaheerpurav/amr-steward-model with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "saaheerpurav/amr-steward-model"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "saaheerpurav/amr-steward-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/saaheerpurav/amr-steward-model

SGLang

How to use saaheerpurav/amr-steward-model with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "saaheerpurav/amr-steward-model" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "saaheerpurav/amr-steward-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "saaheerpurav/amr-steward-model" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "saaheerpurav/amr-steward-model",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use saaheerpurav/amr-steward-model with Docker Model Runner:
```
docker model run hf.co/saaheerpurav/amr-steward-model
```

AMR-Steward — Antibiotic Prescribing Agent

Qwen3-4B + LoRA trained with multi-head GRPO to prescribe the correct antibiotic for drug-resistant bacterial infections. Reward is fully verifiable: seven pure-function components against EUCAST v16.0 breakpoints and IDSA 2022/2023 clinical guidelines. No LLM-as-judge anywhere.

Trained inside the AMR-Steward OpenEnv environment — built for the Meta PyTorch OpenEnv Hackathon, April 2026.


Base model	Qwen/Qwen3-4B
Fine-tuning	LoRA (r=16, α=32, targets: q/k/v/o projections)
Algorithm	Multi-head GRPO (TRL + Unsloth, bf16)
Hardware	A10G GPU — HuggingFace Spaces
Live demo	divyanshb06-amrsteward.hf.space/demo
Environment	github.com/saaheerpurav/amr-steward
Writeup	BLOG.md

Training Results

Three curriculum stages — susceptible organisms → MDR + severe renal failure + allergy constraints:

Stage	Organisms	Budget	Steps	Start → Final	Peak	Mean
1 — Susceptible	K. pneumoniae, E. coli, S. aureus (susceptible)	5 tools	128	0.54 → 0.90	0.923	0.840
2 — Resistant/MDR	+ ESBL, MRSA, VRE	4 tools	64	0.86 → 0.84	0.840	0.790
3 — MDR + Renal + Allergies	+ CRE, XDR Pseudomonas, VISA	3 tools	32	0.81 → 0.88	0.988	0.707

Random baseline: ~0.07. Trained model: 12× better on Stage 1, 10× better on Stage 3.

Reward holds above 0.70 even at Stage 3 — MDR organisms, CrCl 8, penicillin allergy, 3-tool budget.

What This Model Does

The agent receives a clinical patient case and must investigate, then prescribe:

Patient: 67F, ICU, K. pneumoniae bacteremia, meropenem MIC=8.0, CrCl=35, no allergies

Agent investigates:
  → interpret_resistance("meropenem")       → "MIC 8.0 → EUCAST: Resistant"
  → check_guideline("bacteremia")           → "IDSA: CRE K. pneumoniae → ceftazidime-avibactam"
  → assess_patient_factors()               → "CrCl 35: reduce to 1.25g IV q8h"

Agent prescribes:
  → ceftazidime-avibactam 1.25g IV q8h, 14 days
  → reward: 0.92

Without training (broad-empiric): prescribes meropenem → reward ~0.11 (resistant organism, drug has zero effect).

JEPA World Model

The training environment includes a JEPA (Joint Embedding Predictive Architecture) world model — the first application of Meta AI's I-JEPA pattern (Assran et al., CVPR 2023) inside a clinical RL environment.

The world model (≈50K params) predicts in latent space how each tool call would change the agent's known clinical state. It uses an EMA-stabilised target encoder (τ=0.99) — the critical anti-collapse mechanism from the original I-JEPA:

context_encoder(s_before) + tool → predictor → pred_repr
target_encoder(s_after)                      → tgt_repr   [EMA, stop-gradient]
Loss = MSE(pred_repr, tgt_repr)

Three training signals from JEPA: observation hints (ranked tool suggestions), JEPA-weighted reward shaping (0.5×–1.5× bonus multiplier), latent consistency bonus.

Reward Design

All components are pure functions — deterministic, RLVR-verifiable, zero subjectivity:

Component	What it measures	Range
R0 Allergy gate	Prescribing an allergen → total = 0.0, episode ends	{0, 1}
R1 Microbiologic activity	EUCAST MIC classification vs prescribed drug	{0, 1}
R2 Guideline concordance	IDSA first-line=1.0, alternative=0.5, other=0.0	{0, 0.5, 1}
R3 Stewardship (gated on R1)	Narrowest active spectrum; zero if drug doesn't work	[0, 1]
R4 Dose correctness	Matches renal-tier adjusted dose	[0, 1]
R5 Tool efficiency	(unique tool types / budget spent) × (remaining / total)	[0, 1]
R6 Format	Clean single COMMIT line	[0, 1]

Quality ratio (RLVR oracle):

process_score = 0.40·R1 + 0.25·R2 + 0.15·R3 + 0.10·R4
opt_score     = compute_optimal_prescription(patient)  # brute-force over antibiogram
quality_ratio = min(1.0, process_score / opt_score)   # 1.0 iff agent found optimal drug
total         = 0.90·quality_ratio + 0.10·R5

Multi-head GRPO: three independent reward functions (format R6, process R5, terminal quality_ratio) give the trainer separate gradient channels at three timescales — fast format feedback, per-step investigation signal, sparse terminal quality.

Validation

Published Clinical Cases — 3/3 match expert recommendations

Case	Citation	Expert Prescription	Quality
CRE bacteremia, post-renal-transplant	Tamma et al. Clin Infect Dis. 2023	Ceftazidime-avibactam 1.25g IV q8h	1.000
MSSA bacteremia	Maraolo et al. Open Forum Infect Dis. 2018	Cefazolin 2g IV q8h	1.000
VRE on hemodialysis	Britt et al. Clin Infect Dis. 2015	Daptomycin 8mg/kg post-HD	0.939

Adversarial Stress Test — 10/10 pass

Policy	Pass rate (quality_ratio ≥ 0.85)
Broad-empiric (always meropenem)	0 / 10
Random (seed=42)	2 / 10
EUCAST-only (no IDSA)	7 / 10
Trained model	10 / 10

Usage

This is a PEFT LoRA adapter — load on top of Qwen3-4B:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B")
model = PeftModel.from_pretrained(base_model, "saaheerpurav/amr-steward-model")
tokenizer = AutoTokenizer.from_pretrained("saaheerpurav/amr-steward-model")

To use inside the AMR-Steward environment (recommended):

git clone https://github.com/saaheerpurav/amr-steward
pip install -r requirements.txt
uvicorn app:app --port 7860
# Then POST /reset + POST /step via the REST API

Try the live demo: divyanshb06-amrsteward.hf.space/demo

Scope and Limitations

Not approved for clinical use. Research artefact only.
Covers the five WHO critical-priority pathogens: K. pneumoniae, E. coli, P. aeruginosa, S. aureus, Enterococcus spp.
Single-organism, single-drug episodes — no polymicrobial cases or combination therapy.
Trained on synthetic patient cases, not real EHR data.
Vancomycin dosing is renal-tier-based, not AUC/MIC-guided therapeutic drug monitoring.

Training Infrastructure


GPU	NVIDIA A10G (24 GB) via HuggingFace Spaces
Precision	bf16
LoRA rank	r=16, α=32
GRPO generations	4 per step
Max completion length	768 tokens
Stage 1 steps	128
Stage 2 steps	64
Stage 3 steps	32
Framework	TRL 0.17+ · Unsloth · HuggingFace Transformers

Citation

@misc{amr-steward-2026,
  title  = {AMR-Steward: RLVR Training Environment for Clinical Antimicrobial Stewardship},
  author = {Saaheer Purav and Divyansh Bhatia and Palak},
  year   = {2026},
  url    = {https://github.com/saaheerpurav/amr-steward}
}

Built at Meta PyTorch OpenEnv Hackathon India, April 2026. Not approved for clinical use.

Downloads last month: 171

Model tree for saaheerpurav/amr-steward-model

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Adapter

(1019)

this model

Spaces using saaheerpurav/amr-steward-model 2

Paper for saaheerpurav/amr-steward-model

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Paper • 2301.08243 • Published Jan 19, 2023 • 7