Instructions to use joynnayvedya/disaster-response-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use joynnayvedya/disaster-response-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="joynnayvedya/disaster-response-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("joynnayvedya/disaster-response-v2")
model = AutoModelForCausalLM.from_pretrained("joynnayvedya/disaster-response-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use joynnayvedya/disaster-response-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "joynnayvedya/disaster-response-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "joynnayvedya/disaster-response-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/joynnayvedya/disaster-response-v2

SGLang

How to use joynnayvedya/disaster-response-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "joynnayvedya/disaster-response-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "joynnayvedya/disaster-response-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "joynnayvedya/disaster-response-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "joynnayvedya/disaster-response-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use joynnayvedya/disaster-response-v2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for joynnayvedya/disaster-response-v2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for joynnayvedya/disaster-response-v2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for joynnayvedya/disaster-response-v2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="joynnayvedya/disaster-response-v2",
    max_seq_length=2048,
)

Docker Model Runner
How to use joynnayvedya/disaster-response-v2 with Docker Model Runner:
```
docker model run hf.co/joynnayvedya/disaster-response-v2
```

Teaching an LLM to Triage Disasters 🚨

How we built a real RL environment for emergency response — and what we learned when the model hallucinated an entire rescue team

Built for the 2026 Meta & Scalar AI Hackathon, Bangalore.

🎬 Demo Video

▶️ Watch the live demo on YouTube — 2 minutes, fast-forwarded. Watch the agent triage 15 simultaneous disaster incidents in real-time on the live command center dashboard.

It started with a question nobody was asking

What if an LLM had to make the same decisions as the person who picks up the phone during a catastrophe?

Not "write me a poem." Not "solve this math problem."

"The dam is overflowing. 300 people are on rooftops. You have one helicopter. What do you do?"

That's the problem we built for.

🏗️ Architecture

The agent runs locally, sends actions to the deployed HF Space OpenEnv server, and the live dashboard updates in real-time via WebSocket.

The agent is fully decoupled from the environment. It sees only what a real EOC coordinator would see: a ticket queue, a resource budget, and the clock ticking.

We built Disaster Response Coordination OpenEnv — an RL environment where an AI agent acts as an Emergency Incident Commander inside a live Emergency Operations Center.

The agent receives a queue of incident tickets. Real ones. Modeled after:

🌊 2018 Kerala Floods — 483 dead, the largest evacuation since Indian Independence. Dam spillway overflow. Communication blackouts. We recreated the exact decision tree EOC coordinators faced.
☠️ 2020 Vizag LG Polymers Gas Leak — 11 dead, 1000+ hospitalized. A toxic plume drifting over residential areas. Do you evacuate north or south? Wind direction matters.
⚡ 2012 North India Grid Failure — 620 million people without power. Cold-chain medicines failing in hospitals across 7 states. Which hospital gets the generator truck first?

Every ticket the agent sees is based on a real event. Every decision has real stakes baked into the reward function.

For each incident ticket, the agent must execute a precise 4-step workflow:

classify → set_priority → draft_reply → submit_ticket

Miss a step? Penalty. Wrong team? Partial credit. Right team, wrong priority? You still lose something. There is no lucky guess that beats the system.

The Reward Function: Built to Be Unhackable

Most RL environments get reward-hacked in under 100 steps. We designed around that from day one.

ticket_score = 0.40 × team_routing
             + 0.30 × priority_score  
             + 0.30 × reply_quality

task_score   = avg(ticket_scores)
             - invalid_action_penalty   (max 0.15)
             - loop_detection_penalty   (max 0.10)
             - reroute_penalty          (max 0.12)
             - budget_overflow_penalty  (max 0.18)
             - time_pressure_multiplier (Hard mode: 0.75×)

5 independent signals. Dense partial rewards at every step. No sparse end-of-episode surprise. If you get the team right but fumble the priority, you learn something. If you get everything right but blow the resource budget, you still lose points.

"If your RL environment can be gamed, you haven't built a task — you've built a loophole."

📊 Training Results

Reward Curve — GRPO training reward across 3 stages, 135 steps:

Epoch Comparison — Average reward per training epoch:

Before vs After Training — Behavioral comparison of model outputs:

Training Hyperparameters — Full config used for the v2 run:

We fine-tuned Qwen2.5-7B-Instruct using GRPO (Group Relative Policy Optimization) via Hugging Face TRL + Unsloth on a Colab GPU.

The first thing we discovered? The base model immediately hallucinated an entirely new rescue team.

❌  team: "emergency_services"   (not in the valid set)
❌  team: "utility repair"       (the agent made this up)
❌  priority: "very-high"        (also made up)
❌  priority: "immediately"      (still wrong)

The model had read enough emergency management documents to know the vibe of disaster response — but it had no idea what valid actions actually existed in our environment.

That's exactly the kind of failure RL is designed to fix.

After 3 training stages and 135 steps:

✅  team: "rescue"
✅  priority: "urgent"  
✅  JSON output: perfectly structured

The model learned to stop inventing API routes and start operating within the defined action space. This is sparse reward collapse — a documented RL failure mode where small models struggle to optimize multi-step interdependent workflows. Our environment was hard enough to expose it. That's a feature, not a bug.

The Benchmark Results

We ran the trained model across all 3 difficulty tiers against the live deployed environment:

Agent	Easy	Medium	Hard	Avg
Heuristic Baseline (hardcoded rules)	0.704	0.683	0.660	0.682
GRPO Qwen2.5-7B v2 (ours)	0.641	0.665	0.601	0.636

All 3 tiers: ✅ PASS ✅ PASS ✅ PASS

The heuristic baseline uses hand-crafted regex patterns and keyword matching. Zero generalisation. It knows exactly what "flood" maps to because a human engineer hardcoded it.

Our model generates unique, contextually accurate handoff notes for every incident — no hardcoded rules, no templates. It reads the situation and decides. The fact that it stays within 4.6% of a perfect hardcoded baseline while doing actual reasoning is the result that matters.

The Dashboard: Because Judges Are Human Too

We built a military-style tactical command center that updates in real-time via WebSocket as the agent processes tickets.

🗺️ OpenStreetMap with color-coded incident markers (red = urgent, orange = high, ✓ = resolved)
⚡ ARIA — an AI Incident Analyst powered by Gemini, available for live analysis of any incident
📊 Real-time score tracker, resource budget bar, team routing feed
🔔 Operations feed with audio alerts

It is not a static demo. When you run inference.py, the dashboard updates live. You can watch the agent work in real-time.

▶️ Open the Command Center

Try It Yourself

git clone https://github.com/letsjoyn/meta-scalar-hack.git
cd meta-scalar-hack
pip install -e .

# Run the agent against the live environment
$env:OPENENV_BASE_URL = "https://joynnayvedya-disaster-response-openenv.hf.space"
$env:API_BASE_URL     = "https://router.huggingface.co/v1"
$env:MODEL_NAME       = "Qwen/Qwen2.5-72B-Instruct"
$env:HF_TOKEN         = "hf_YOUR_TOKEN"
py inference.py

Links

Resource	URL
🤗 HF Space (Live Environment)	joynnayvedya/disaster-response-openenv
🧠 Trained Model	joynnayvedya/disaster-response-v2
📓 Training Notebook (Colab)	Open in Colab
💻 GitHub	letsjoyn/meta-scalar-hack