Spaces:
Sleeping
title: DataDetective
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
DataDetective β Business Incident Investigation Environment
An OpenEnv environment where AI agents investigate real-world business incidents by querying a SQL database, analysing patterns, and submitting root-cause findings.
What It Does
The agent is given a realistic company database (TechMart β a mid-size B2B+B2C electronics retailer) and a business problem to investigate. It can execute SQL queries to explore the data, then submit a final written analysis. The environment automatically grades the analysis based on whether key findings were identified. Each task has 5 grading criteria worth 0.20 each, enabling meaningful partial credit.
Tasks (Easy β Hard)
| # | Task ID | Difficulty | Scenario |
|---|---|---|---|
| 1 | orders_drop |
Easy | Order volume dropped sharply after promo ended |
| 2 | returns_spike |
Medium | Product returns spiking in West region (defective SKU) |
| 3 | supplier_quality |
Medium | Supplier-level quality crisis across multiple products |
| 4 | shipping_delay |
Medium-Hard | Customer satisfaction crisis from carrier delays |
| 5 | inventory_stockout |
Medium-Hard | Regional sales underperformance from warehouse stockout |
| 6 | customer_churn |
Hard | Active customer decline across segments post price hike |
| 7 | revenue_paradox |
Hard | Revenue up but profit down β multi-causal margin erosion |
| 8 | fraud_detection |
Hard | Coordinated fraud ring with fake accounts |
| 9 | repeat_purchase_decline |
Hard | Repeat purchase collapse masked by acquisition spend |
Each task is scored 0.0 β 1.0 based on specific findings the agent must discover.
Action / Observation Spaces
Action (DataDetectiveAction)
| Field | Type | Description |
|---|---|---|
action_type |
str |
"query" to run SQL, "answer" to submit findings |
content |
str |
SQL query string or final analysis text |
Observation (DataDetectiveObservation)
| Field | Type | Description |
|---|---|---|
output |
str |
Query results (formatted table) or feedback |
task_description |
str |
The investigation task |
schema_info |
str |
Database schema (shown at reset) |
step_number |
int |
Current step |
max_steps |
int |
Maximum steps allowed (30) |
message |
str |
Status message |
Database Schema (11 Tables)
The TechMart database includes:
| Table | Description |
|---|---|
customers |
Customer demographics (region, segment, signup date) |
products |
Product catalog (category, price, cost, supplier) |
orders |
Order history with totals |
order_items |
Line items with quantity and unit price |
returns |
Product returns with reasons and refund amounts |
promotions |
Promotional campaigns with discount percentages |
price_changes |
Historical price adjustments |
shipping |
Shipment records with carrier and delivery dates |
support_tickets |
Customer support tickets by category and priority |
inventory_log |
Daily stock levels per product per warehouse region |
marketing_spend |
Daily marketing spend by channel, campaign, and region |
All data is synthetic, generated in-memory (no external databases required).
Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Start the Server
uvicorn server.app:app --host 0.0.0.0 --port 7860
3. Health Check
curl http://localhost:7860/health
4. Run the Baseline Agent
API_BASE_URL="https://router.huggingface.co/v1" \
MODEL_NAME="gpt-4.1-mini" \
HF_TOKEN="hf_..." \
python inference.py
5. Docker
docker build -t data-detective .
docker run -p 7860:7860 data-detective
Environment Variables
| Env Var | Purpose | Required |
|---|---|---|
API_BASE_URL |
LLM endpoint URL | Yes |
MODEL_NAME |
Model identifier | Yes |
HF_TOKEN |
API key / HF token | Yes |
ENV_URL |
Environment server URL | No (default: http://localhost:7860) |
How Grading Works
Each task has an automated grader that checks the agent's final answer for specific key findings (keywords, patterns, named entities). Each task has 5 grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit is awarded for each finding discovered.
Setup Requirements
- Python 3.10+
- No GPU required
- Runs within 2 vCPU / 8 GB memory
- All data is generated in-memory (no external databases)