title: GridOps
emoji: ⚡
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
tags:
- openenv
- reinforcement-learning
- microgrid
- energy
GridOps — Community Microgrid Bridge Operator
A production-grade OpenEnv RL environment for Indian community microgrid operation. Plug-and-play. Deterministic. Benchmarkable.
Live demo: 77ethers-gridops.hf.space/dashboard/ | HF Space: huggingface.co/spaces/77ethers/gridops
At a Glance
| Domain | Real-world Indian community microgrid operation (100 homes, summer) |
| Interface | Full OpenEnv spec: reset() -> step(action) -> state(), typed Pydantic models |
| Actions | 3D continuous: battery_dispatch [-1,1], diesel_dispatch [0,1], demand_shedding [0,1] |
| Observations | 30+ fields: demand, solar, price, SOC, forecasts, energy flows. Partial observability (noisy forecasts). |
| Tasks | 3 tasks (easy -> medium -> hard), each testing a different RL capability |
| Grading | Deterministic, programmatic, 0.0-1.0. Same seed = same score, every run. |
| Reward | Dense per-step signal, aligned with episode grader (50% cost + 25% reliability + 25% green) |
| Anti-gaming | 5 mechanisms: degradation, startup costs, rebound, smooth VoLL, grid cap |
| Baseline | Grok-4 LLM: 0.80/0.82/0.72 — beats hand-coded oracle on all tasks |
| Deployment | Docker + HF Space + openenv validate 6/6 pass |
Why This Environment Exists
Community microgrid operation is a real job in India under the RDSS (Revamped Distribution Sector Scheme). IEX prosumer bidding is live. Over 50 million Indian homes will have rooftop solar by 2030, and someone — or some agent — needs to manage the battery-grid-diesel tradeoff in real time.
This is not a toy problem. This is what a microgrid operator at an Indian housing society actually decides every hour:
- Should I charge the battery now (grid is cheap at Rs 4/kWh) or save capacity for tonight (price will spike to Rs 15)?
- Should I run diesel (Rs 25/kWh + Rs 100 startup) or risk a blackout (Rs 150/kWh VoLL penalty)?
- Should I ask residents to reduce AC usage (Rs 40/kWh + 100% rebounds tomorrow)?
Simple heuristics provably fail. The environment requires multi-hour planning, price forecasting, and constraint management under partial observability.
What makes this a strong benchmark
- Any agent can plug in immediately — typed JSON actions in, typed observations out, no custom hacks
- Fully deterministic — same seed, same actions = identical trajectory every time. Leaderboard-ready.
- Tasks differentiate agents — Do-Nothing scores 0.45-0.58, Oracle 0.70-0.81, Grok-4 LLM 0.72-0.82. Clear skill gradient.
- Can't be gamed — 5 anti-exploit mechanisms prevent reward hacking (detailed below)
- Grader = ground truth — programmatic, deterministic, partial credit, aligned with per-step reward
The Problem at a Glance
You have:
- Solar panels — 250 kW peak, free, but only during daylight
- Community battery — 500 kWh storage, 100 kW max charge/discharge
- Diesel generator — 100 kW, but Rs 25/kWh + Rs 100 startup cost
- National grid — auto-imports/exports as slack (capped at 200 kW)
You control (3 continuous actions):
| Action | Range | What it does |
|---|---|---|
battery_dispatch |
-1 to +1 | Charge (-100 kW) or discharge (+100 kW). Rs 2.5/kWh degradation. |
diesel_dispatch |
0 to 1 | Diesel output (0-100 kW). Rs 25/kWh + Rs 100 startup if was off. |
demand_shedding |
0 to 1 | Ask residents to cut 0-20% usage. 100% rebounds next hour. Rs 40/kWh penalty. |
You do NOT control the grid. It automatically absorbs whatever energy gap remains after your decisions. If the gap exceeds 200 kW, that's a blackout (Rs 150/kWh penalty).
The Critical Bottleneck
At 8 PM every evening, demand hits 250 kW but the grid maxes out at 200 kW and solar is zero.
The 50 kW gap must come from your battery. If you discharged it for profit during the day, the neighborhood goes dark.
On a heatwave day (Task 2-3), demand spikes to 325-375 kW. Now the gap is 125-175 kW — you need battery + diesel + shedding just to survive. And in Task 3, the grid goes down entirely for 6 hours.
What the Agent Sees (Observation)
| Field | Description |
|---|---|
hour |
Current hour in episode (0-72, starting 6 AM) |
demand_kw |
What the 100 homes need right now |
solar_kw |
Free solar power available (0 at night, up to 250 kW midday) |
battery_soc |
Battery charge level (0-1, i.e. 0-500 kWh) |
grid_price |
Current IEX electricity price (Rs 3-20/kWh) |
diesel_fuel_remaining |
Diesel tank level (0-1) |
diesel_is_on |
Was diesel running last step? (startup cost if turning on) |
demand_forecast_4h |
Noisy 4-hour demand forecast (+-15%) |
solar_forecast_4h |
Noisy 4-hour solar forecast |
price_forecast_4h |
Noisy 4-hour price forecast |
cumulative_blackout_kwh |
Total blackout energy so far |
cumulative_cost |
Total money spent so far (Rs) |
flow_* |
Detailed energy flows (solar, grid import/export, battery in/out, diesel, demand) |
Partial observability: forecasts have +-15% Gaussian noise. The agent cannot perfectly predict heatwave intensity, cloud cover, or price spikes.
3 Tasks (Each Tests a Different RL Capability)
Task 1: Normal Summer (Easy) — Tests basic arbitrage
- Clear skies, standard demand (~100 kW avg, 250 kW peak)
- Grid prices Rs 3-12 with clear cheap night / expensive evening pattern
- What the agent must learn: charge battery at night (cheap grid), discharge during evening peak (expensive grid), let solar cover midday
Task 2: Heatwave + Price Spike (Medium) — Tests temporal planning
- Day 2-3 heatwave (+30% demand), intermittent clouds
- Rs 20 price spike on Day 2 evening — visible in 4-hour forecast
- What the agent must learn: read the forecast, hold battery charge for the spike instead of greedily discharging early. A greedy policy discharges mid-afternoon; an RL agent that reads the forecast holds until 6 PM.
Task 3: Extreme Crisis + Grid Outage (Hard) — Tests constraint management
- Full 3-day heatwave, -30% solar from haze, +50% demand
- Limited diesel (33% tank = ~8 hours at full power)
- 6-hour grid outage on Day 2 afternoon — grid cap drops to 0 kW
- What the agent must learn: aggressively pre-charge battery before the outage, ration diesel across the outage window, shed demand strategically to stretch resources. This is true microgrid islanding.
Grading (0.0 - 1.0)
score = 0.50 x cost_efficiency + 0.25 x reliability + 0.25 x green_score
| Component | Formula | What it rewards |
|---|---|---|
| Cost efficiency (50%) | 1 - (agent_cost / baseline_cost) |
Spending less than a dumb "max grid import" baseline |
| Reliability (25%) | (demand_met - blackout) / demand_met |
Keeping the lights on |
| Green score (25%) | 1 - (diesel_used / total_demand) |
Minimizing diesel emissions |
Baseline: "import max grid every hour, no battery/diesel/shedding" — physically possible, but expensive and suffers blackouts during peak hours and grid outages.
VoLL (Value of Lost Load): Rs 150/kWh blackout penalty. This is a smooth gradient — no hard reliability cliff. The agent always gets signal for reducing blackouts incrementally.
Why Heuristics Fail
| Strategy | Why it fails |
|---|---|
| "Always discharge battery" | Empty by evening peak. 50 kW gap = blackout. Score collapses. |
| "Always run diesel" | Rs 25/kWh vs Rs 5 grid at night. Hemorrhages money. Green score = 0. |
| "Shed demand whenever short" | Rs 40/kWh cost + 100% rebounds next hour. More expensive than diesel. |
| "Discharge when price > X" | Ignores battery state. Drains SOC before the real peak. |
| "Do nothing" | Grid alone can't cover evening peak. 3.6% blackout rate. |
The oracle (rule-based, time-of-day + price-aware) scores 0.70-0.81. There's a clear 0.20-0.35 gap between heuristics and the oracle, proving the environment has real optimization headroom.
Anti-Gaming Design
The environment has 5 mechanisms that prevent reward hacking:
- Shedding is expensive — Rs 40/kWh + 100% rebound. Costlier than diesel. True emergency only.
- Battery degradation — Rs 2.5/kWh throughput. Prevents infinite cycling for tiny arbitrage.
- Diesel startup cost — Rs 100 per on-switch. Prevents on/off toggling.
- VoLL is smooth — Rs 150/kWh with no cliff. Agent can't exploit a binary gate.
- Grid is capped — 200 kW max (0 during outages). Can't just buy everything.
Baseline Scores
| Strategy | Task 1 | Task 2 | Task 3 | What it does |
|---|---|---|---|---|
| Grok-4 (LLM) | 0.80 | 0.82 | 0.72 | Reads observations, reasons about tradeoffs |
| Oracle (rule-based) | 0.79 | 0.81 | 0.70 | Time-of-day + price + SOC heuristic |
| Do-Nothing (grid only) | 0.58 | 0.51 | 0.45 | Grid covers everything it can |
| Always-Discharge | 0.59 | 0.51 | 0.45 | Drains battery, empty by evening |
| Always-Diesel | 0.42 | 0.42 | 0.44 | Rs 25/kWh burns money |
- LLM beats oracle: Grok-4 matched or exceeded the hand-coded oracle on every task
- Deterministic: identical scores across 3 runs (seeded RNG)
- Oracle ceiling < 1.0: real physics constraints, not inflated scores
- Clear separation: LLM > oracle >> heuristics (0.20-0.38 gap from best to worst)
- Task 3 hardest: grid outage makes it genuinely challenging even for frontier LLMs
Key Physics
| Component | Spec | Cost |
|---|---|---|
| Solar | 250 kW peak, bell curve 6 AM - 6 PM | Free |
| Battery | 500 kWh, 100 kW max, 90% round-trip (sqrt each way) | Rs 2.5/kWh degradation |
| Diesel | 100 kW max | Rs 25/kWh + Rs 100 startup |
| Grid | 200 kW max import/export (slack variable) | Market price Rs 3-20/kWh |
| Blackout | Unmet demand when all sources exhausted | Rs 150/kWh VoLL penalty |
| Shedding | Up to 20% demand reduction | Rs 40/kWh + 100% rebound next hour |
Energy balance every step:
supply = solar + grid_import + battery_discharge + diesel
consume = effective_demand + grid_export + battery_charge
Supply always equals consumption. Any unmet demand beyond grid cap = blackout.
Setup & Usage
# Install
pip install -e .
# Run server
uvicorn gridops.server.app:app --port 8000
# Interactive dashboard
open http://localhost:8000/dashboard/
# Validate oracle + determinism
python scripts/oracle_test.py
# Run LLM baseline
export API_BASE_URL="https://router.huggingface.co/v1"
export HF_TOKEN="your-token"
export MODEL_NAME="meta-llama/Llama-3.3-70B-Instruct"
python inference.py
Docker
docker build -t gridops .
docker run -p 8000:8000 gridops
OpenEnv Validation
# Local structure check
openenv validate
# Runtime check (against live server)
openenv validate --url http://localhost:8000
Project Structure
gridops/
├── inference.py # LLM baseline (API_BASE_URL, MODEL_NAME, HF_TOKEN)
├── openenv.yaml # OpenEnv manifest
├── Dockerfile # Docker deployment
├── server/app.py # Root entry point (openenv validate)
├── gridops/
│ ├── models.py # GridOpsAction, GridOpsObservation (Pydantic)
│ ├── simulation/
│ │ ├── physics.py # Energy balance, battery, VoLL, degradation, outages
│ │ └── scenarios.py # Demand/solar/price curve generators
│ ├── tasks/
│ │ ├── definitions.py # 3 task configs (normal, heatwave, crisis+outage)
│ │ └── graders.py # 0-1 scoring: cost + reliability + green
│ └── server/
│ ├── app.py # FastAPI + OpenEnv create_app
│ ├── environment.py # OpenEnv Environment class
│ └── static/index.html # Interactive dashboard with energy flows
└── scripts/
└── oracle_test.py # Oracle + heuristic validation + determinism check
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/schema |
GET | Action/observation/state JSON schemas |
/metadata |
GET | Environment name and description |
/reset |
POST | Reset environment (OpenEnv standard) |
/step |
POST | Execute action (OpenEnv standard) |
/state |
GET | Current state (OpenEnv standard) |
/ws |
WebSocket | Persistent session (OpenEnv standard) |
/api/reset |
POST | Stateful reset (dashboard) |
/api/step |
POST | Stateful step (dashboard) |
/api/state |
GET | Stateful state (dashboard) |
/tasks |
GET | List available tasks |
/dashboard/ |
GET | Interactive web UI |