Instructions to use joynnayvedya/disaster-response-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use joynnayvedya/disaster-response-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="joynnayvedya/disaster-response-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("joynnayvedya/disaster-response-v2") model = AutoModelForCausalLM.from_pretrained("joynnayvedya/disaster-response-v2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use joynnayvedya/disaster-response-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "joynnayvedya/disaster-response-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "joynnayvedya/disaster-response-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/joynnayvedya/disaster-response-v2
- SGLang
How to use joynnayvedya/disaster-response-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "joynnayvedya/disaster-response-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "joynnayvedya/disaster-response-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "joynnayvedya/disaster-response-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "joynnayvedya/disaster-response-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use joynnayvedya/disaster-response-v2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for joynnayvedya/disaster-response-v2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for joynnayvedya/disaster-response-v2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for joynnayvedya/disaster-response-v2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="joynnayvedya/disaster-response-v2", max_seq_length=2048, ) - Docker Model Runner
How to use joynnayvedya/disaster-response-v2 with Docker Model Runner:
docker model run hf.co/joynnayvedya/disaster-response-v2
Teaching an LLM to Triage Disasters 🚨
How we built a real RL environment for emergency response — and what we learned when the model hallucinated an entire rescue team
Built for the 2026 Meta & Scalar AI Hackathon, Bangalore.
🎬 Demo Video
▶️ Watch the live demo on YouTube — 2 minutes, fast-forwarded. Watch the agent triage 15 simultaneous disaster incidents in real-time on the live command center dashboard.
It started with a question nobody was asking
What if an LLM had to make the same decisions as the person who picks up the phone during a catastrophe?
Not "write me a poem." Not "solve this math problem."
"The dam is overflowing. 300 people are on rooftops. You have one helicopter. What do you do?"
That's the problem we built for.
🏗️ Architecture
The agent runs locally, sends actions to the deployed HF Space OpenEnv server, and the live dashboard updates in real-time via WebSocket.
The agent is fully decoupled from the environment. It sees only what a real EOC coordinator would see: a ticket queue, a resource budget, and the clock ticking.
We built Disaster Response Coordination OpenEnv — an RL environment where an AI agent acts as an Emergency Incident Commander inside a live Emergency Operations Center.
The agent receives a queue of incident tickets. Real ones. Modeled after:
- 🌊 2018 Kerala Floods — 483 dead, the largest evacuation since Indian Independence. Dam spillway overflow. Communication blackouts. We recreated the exact decision tree EOC coordinators faced.
- ☠️ 2020 Vizag LG Polymers Gas Leak — 11 dead, 1000+ hospitalized. A toxic plume drifting over residential areas. Do you evacuate north or south? Wind direction matters.
- ⚡ 2012 North India Grid Failure — 620 million people without power. Cold-chain medicines failing in hospitals across 7 states. Which hospital gets the generator truck first?
Every ticket the agent sees is based on a real event. Every decision has real stakes baked into the reward function.
For each incident ticket, the agent must execute a precise 4-step workflow:
classify → set_priority → draft_reply → submit_ticket
Miss a step? Penalty. Wrong team? Partial credit. Right team, wrong priority? You still lose something. There is no lucky guess that beats the system.
The Reward Function: Built to Be Unhackable
Most RL environments get reward-hacked in under 100 steps. We designed around that from day one.
ticket_score = 0.40 × team_routing
+ 0.30 × priority_score
+ 0.30 × reply_quality
task_score = avg(ticket_scores)
- invalid_action_penalty (max 0.15)
- loop_detection_penalty (max 0.10)
- reroute_penalty (max 0.12)
- budget_overflow_penalty (max 0.18)
- time_pressure_multiplier (Hard mode: 0.75×)
5 independent signals. Dense partial rewards at every step. No sparse end-of-episode surprise. If you get the team right but fumble the priority, you learn something. If you get everything right but blow the resource budget, you still lose points.
"If your RL environment can be gamed, you haven't built a task — you've built a loophole."
📊 Training Results
Reward Curve — GRPO training reward across 3 stages, 135 steps:
Epoch Comparison — Average reward per training epoch:
Before vs After Training — Behavioral comparison of model outputs:
Training Hyperparameters — Full config used for the v2 run:
We fine-tuned Qwen2.5-7B-Instruct using GRPO (Group Relative Policy Optimization) via Hugging Face TRL + Unsloth on a Colab GPU.
The first thing we discovered? The base model immediately hallucinated an entirely new rescue team.
❌ team: "emergency_services" (not in the valid set)
❌ team: "utility repair" (the agent made this up)
❌ priority: "very-high" (also made up)
❌ priority: "immediately" (still wrong)
The model had read enough emergency management documents to know the vibe of disaster response — but it had no idea what valid actions actually existed in our environment.
That's exactly the kind of failure RL is designed to fix.
After 3 training stages and 135 steps:
✅ team: "rescue"
✅ priority: "urgent"
✅ JSON output: perfectly structured
The model learned to stop inventing API routes and start operating within the defined action space. This is sparse reward collapse — a documented RL failure mode where small models struggle to optimize multi-step interdependent workflows. Our environment was hard enough to expose it. That's a feature, not a bug.
The Benchmark Results
We ran the trained model across all 3 difficulty tiers against the live deployed environment:
| Agent | Easy | Medium | Hard | Avg |
|---|---|---|---|---|
| Heuristic Baseline (hardcoded rules) | 0.704 | 0.683 | 0.660 | 0.682 |
| GRPO Qwen2.5-7B v2 (ours) | 0.641 | 0.665 | 0.601 | 0.636 |
All 3 tiers: ✅ PASS ✅ PASS ✅ PASS
The heuristic baseline uses hand-crafted regex patterns and keyword matching. Zero generalisation. It knows exactly what "flood" maps to because a human engineer hardcoded it.
Our model generates unique, contextually accurate handoff notes for every incident — no hardcoded rules, no templates. It reads the situation and decides. The fact that it stays within 4.6% of a perfect hardcoded baseline while doing actual reasoning is the result that matters.
The Dashboard: Because Judges Are Human Too
We built a military-style tactical command center that updates in real-time via WebSocket as the agent processes tickets.
- 🗺️ OpenStreetMap with color-coded incident markers (red = urgent, orange = high, ✓ = resolved)
- ⚡ ARIA — an AI Incident Analyst powered by Gemini, available for live analysis of any incident
- 📊 Real-time score tracker, resource budget bar, team routing feed
- 🔔 Operations feed with audio alerts
It is not a static demo. When you run inference.py, the dashboard updates live. You can watch the agent work in real-time.
Try It Yourself
git clone https://github.com/letsjoyn/meta-scalar-hack.git
cd meta-scalar-hack
pip install -e .
# Run the agent against the live environment
$env:OPENENV_BASE_URL = "https://joynnayvedya-disaster-response-openenv.hf.space"
$env:API_BASE_URL = "https://router.huggingface.co/v1"
$env:MODEL_NAME = "Qwen/Qwen2.5-72B-Instruct"
$env:HF_TOKEN = "hf_YOUR_TOKEN"
py inference.py
Links
| Resource | URL |
|---|---|
| 🤗 HF Space (Live Environment) | joynnayvedya/disaster-response-openenv |
| 🧠 Trained Model | joynnayvedya/disaster-response-v2 |
| 📓 Training Notebook (Colab) | Open in Colab |
| 💻 GitHub | letsjoyn/meta-scalar-hack |
Built for the 2026 Meta & Scalar AI Hackathon — Grand Finale, Bangalore.
Every scenario based on a real disaster. Every reward signal designed to be unhackable.
Uploaded finetuned model
- Developed by: joynnayvedya
- License: apache-2.0
- Finetuned from model : unsloth/Qwen2.5-7B-Instruct-bnb-4bit
This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 294
Model tree for joynnayvedya/disaster-response-v2
Base model
Qwen/Qwen2.5-7B



