Spaces:

Viani
/

DataDetective

Sleeping

App Files Files Community

DataDetective / README.md

Viani

Deploy DataDetective: 9-task business investigation environment

bcd8636 verified 11 days ago

preview code

raw

history blame contribute delete

4.66 kB

metadata

title: DataDetective
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860

DataDetective — Business Incident Investigation Environment

An OpenEnv environment where AI agents investigate real-world business incidents by querying a SQL database, analysing patterns, and submitting root-cause findings.

What It Does

The agent is given a realistic company database (TechMart — a mid-size B2B+B2C electronics retailer) and a business problem to investigate. It can execute SQL queries to explore the data, then submit a final written analysis. The environment automatically grades the analysis based on whether key findings were identified. Each task has 5 grading criteria worth 0.20 each, enabling meaningful partial credit.

Tasks (Easy → Hard)

#	Task ID	Difficulty	Scenario
1	`orders_drop`	Easy	Order volume dropped sharply after promo ended
2	`returns_spike`	Medium	Product returns spiking in West region (defective SKU)
3	`supplier_quality`	Medium	Supplier-level quality crisis across multiple products
4	`shipping_delay`	Medium-Hard	Customer satisfaction crisis from carrier delays
5	`inventory_stockout`	Medium-Hard	Regional sales underperformance from warehouse stockout
6	`customer_churn`	Hard	Active customer decline across segments post price hike
7	`revenue_paradox`	Hard	Revenue up but profit down — multi-causal margin erosion
8	`fraud_detection`	Hard	Coordinated fraud ring with fake accounts
9	`repeat_purchase_decline`	Hard	Repeat purchase collapse masked by acquisition spend

Each task is scored 0.0 – 1.0 based on specific findings the agent must discover.

Action / Observation Spaces

Action (`DataDetectiveAction`)

Field	Type	Description
`action_type`	`str`	`"query"` to run SQL, `"answer"` to submit findings
`content`	`str`	SQL query string or final analysis text

Observation (`DataDetectiveObservation`)

Field	Type	Description
`output`	`str`	Query results (formatted table) or feedback
`task_description`	`str`	The investigation task
`schema_info`	`str`	Database schema (shown at reset)
`step_number`	`int`	Current step
`max_steps`	`int`	Maximum steps allowed (30)
`message`	`str`	Status message

Database Schema (11 Tables)

The TechMart database includes:

Table	Description
`customers`	Customer demographics (region, segment, signup date)
`products`	Product catalog (category, price, cost, supplier)
`orders`	Order history with totals
`order_items`	Line items with quantity and unit price
`returns`	Product returns with reasons and refund amounts
`promotions`	Promotional campaigns with discount percentages
`price_changes`	Historical price adjustments
`shipping`	Shipment records with carrier and delivery dates
`support_tickets`	Customer support tickets by category and priority
`inventory_log`	Daily stock levels per product per warehouse region
`marketing_spend`	Daily marketing spend by channel, campaign, and region

All data is synthetic, generated in-memory (no external databases required).

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Start the Server

uvicorn server.app:app --host 0.0.0.0 --port 7860

3. Health Check

curl http://localhost:7860/health

4. Run the Baseline Agent

API_BASE_URL="https://router.huggingface.co/v1" \
MODEL_NAME="gpt-4.1-mini" \
HF_TOKEN="hf_..." \
python inference.py

5. Docker

docker build -t data-detective .
docker run -p 7860:7860 data-detective

Environment Variables

Env Var	Purpose	Required
`API_BASE_URL`	LLM endpoint URL	Yes
`MODEL_NAME`	Model identifier	Yes
`HF_TOKEN`	API key / HF token	Yes
`ENV_URL`	Environment server URL	No (default: `http://localhost:7860`)

How Grading Works

Each task has an automated grader that checks the agent's final answer for specific key findings (keywords, patterns, named entities). Each task has 5 grading criteria worth 0.20 each, for a maximum score of 1.0. Partial credit is awarded for each finding discovered.

Setup Requirements

Python 3.10+
No GPU required
Runs within 2 vCPU / 8 GB memory
All data is generated in-memory (no external databases)