Rithwik Ravi commited on
Commit ·
c861e8f
0
Parent(s):
first commit
Browse files
README.md
ADDED
|
@@ -0,0 +1,122 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: SQL Data Engineer Environment
|
| 3 |
+
emoji: 🗄️
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: indigo
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
+
app_port: 7860
|
| 9 |
+
---
|
| 10 |
+
# OpenEnv: SQL Data Engineer Environment
|
| 11 |
+
|
| 12 |
+
Welcome to the **SQL Data Engineer Environment**—a robust, fully-compliant baseline environment built for the Meta OpenEnv Hackathon. This project tests an AI agent's ability to natively interact with a live SQL engine to perform pragmatic data extraction, cleansing, and complex schema normalization tasks.
|
| 13 |
+
|
| 14 |
+
## Environment Description & Motivation
|
| 15 |
+
|
| 16 |
+
### Why SQL Data Engineering?
|
| 17 |
+
The industry needs reliable agents that can act as backend developers, DB administrators, and data engineers. While many environments focus on web browsing or gaming, manipulating relational databases is a high-value, real-world task. This environment simulates authentic obstacles developers face:
|
| 18 |
+
- Analyzing undocumented database schemas.
|
| 19 |
+
- Cleansing noisy string data into strict scalar types.
|
| 20 |
+
- Restructuring and normalizing flat tables into relational architectures while rigorously preserving foreign-key constraints.
|
| 21 |
+
|
| 22 |
+
It presents an excellent metric to gauge an LLM's structured reasoning and precise SQL generation capabilities.
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## Space Definitions
|
| 27 |
+
|
| 28 |
+
The environment adheres strictly to the OpenEnv Pydantic specification, enabling seamless API integration.
|
| 29 |
+
|
| 30 |
+
### Observation Space
|
| 31 |
+
The observation space is tailored to provide dense context while remaining token-efficient:
|
| 32 |
+
- `goal` (string): The explicit task prompt/requirement dictating what the agent must achieve.
|
| 33 |
+
- `schema_dump` (string | null): The current DDL representing all tables and views in the DB (schema definition). Sent back only when the schema dynamically changes or the state is stable.
|
| 34 |
+
- `result` (string): The standard output of the previously executed query (capped to 10 rows for SELECTs) or a clear `rowcount` confirmation for INSERTs/UPDATEs.
|
| 35 |
+
- `last_action_error` (boolean): Flag indicating if the previous SQL Action threw a syntax or logic engine error.
|
| 36 |
+
- `step` (integer): The current episode step tally.
|
| 37 |
+
|
| 38 |
+
### Action Space
|
| 39 |
+
- `action_str` (string): The agent must return a JSON dictionary containing a single, syntactically correct SQLite query to be executed against the backend state.
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
## Tasks & Graders
|
| 44 |
+
|
| 45 |
+
Each episode challenges the agent with one of 3 tasks featuring deterministic OpenEnv graders scoring between `0.0` and `1.0`.
|
| 46 |
+
|
| 47 |
+
#### 1. Easy: Data Extraction (View Creation)
|
| 48 |
+
- **Goal**: Read a `customers` table, filter out metrics > 1000.0, and construct a targeted `high_value_customers` SQL View.
|
| 49 |
+
- **Difficulty**: Easy. Tests basic SELECT syntax and DDL proficiency.
|
| 50 |
+
- **Grader**: Validates if the correct view exists in `sqlite_master`, assigns `0.5` points. Exact row and content matching grants the remaining `+0.5`.
|
| 51 |
+
|
| 52 |
+
#### 2. Medium: Data Cleaning
|
| 53 |
+
- **Goal**: Coerce a messy `products` table. The agent must standardize categorical string sizes (e.g., converting 'ELEC' to 'ELECTRONICS') and extract numeric floats from dirty string pricing (e.g., '$85.00' -> `85.0`) into a new generated float column.
|
| 54 |
+
- **Difficulty**: Medium. Tests native string pattern matching and targeted UPDATE pipelines.
|
| 55 |
+
- **Grader**: Adding the column yields `0.3` points. Correct categorical string mapping grants up to `0.3`, and correctly extracted float prices yield `0.4` respectively.
|
| 56 |
+
|
| 57 |
+
#### 3. Hard: Schema Normalization
|
| 58 |
+
- **Goal**: Normalize a completely flat `hospital_records` repository into a structured 3-table format (`patients`, `doctors`, `appointments`). Data must be completely migrated and bound by Primary/Foreign key constraints.
|
| 59 |
+
- **Difficulty**: Hard. Tests multi-step schema architectural reasoning and safe data-migration pipelines.
|
| 60 |
+
- **Grader**: Validating table signatures issues `0.1` per table. Proper data counts yield `0.1` each, and if a relational JOIN across the new DB perfectly rebuilds the original flat map, the final `0.3` is awarded.
|
| 61 |
+
|
| 62 |
+
### The Dense Reward Function
|
| 63 |
+
Scores are completely dense over the episode lifecycle.
|
| 64 |
+
At `step(action)`, the grader executes. The mathematical reward signal is continuous:
|
| 65 |
+
`Reward = (Current_Score - Previous_Score)`.
|
| 66 |
+
*Note: A `-0.05` penalty is actively applied when `last_action_error` triggers, strongly discouraging hallucinated or malformed SQL loops.*
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## Local Setup & Usage
|
| 71 |
+
|
| 72 |
+
To validate the OpenEnv schema, install the framework, and run the OpenAI-compatible baseline script locally:
|
| 73 |
+
|
| 74 |
+
```bash
|
| 75 |
+
# 1. Clone the repository and navigate inside
|
| 76 |
+
git clone <your-repo-url>
|
| 77 |
+
cd OpenEnv-SQL-Data-Engineer
|
| 78 |
+
|
| 79 |
+
# 2. Setup standard Python virtual environment
|
| 80 |
+
python -m venv venv
|
| 81 |
+
source venv/bin/activate # Or `venv\Scripts\activate` on Windows
|
| 82 |
+
|
| 83 |
+
# 3. Install core dependencies (FastAPI, Pydantic, OpenAI, OpenEnv)
|
| 84 |
+
pip install openenv openenv-core openai pydantic fastapi uvicorn requests
|
| 85 |
+
|
| 86 |
+
# 4. Verify OpenEnv schema compliance locally
|
| 87 |
+
openenv validate
|
| 88 |
+
|
| 89 |
+
# 5. Execute the baseline AI Agent (make sure to set your key)
|
| 90 |
+
export OPENAI_API_KEY="your-api-key"
|
| 91 |
+
export MODEL_NAME="gpt-4o"
|
| 92 |
+
python inference.py
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|
| 97 |
+
## Deployment Instructions
|
| 98 |
+
|
| 99 |
+
### Docker Container Build
|
| 100 |
+
The environment provides a native Hugging Face structured `Dockerfile` configured to launch on port 7860 as an unprivileged user.
|
| 101 |
+
|
| 102 |
+
```bash
|
| 103 |
+
docker build -t openenv-sql .
|
| 104 |
+
docker run -p 7860:7860 openenv-sql
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
### Deploying to Hugging Face Spaces
|
| 108 |
+
To finalize your Hackathon deployment and spin up the live inference API:
|
| 109 |
+
1. First, create a new minimal **Docker Space** inside Hugging Face.
|
| 110 |
+
2. Ensure you add `HF_TOKEN` globally inside your HF Space Repository secrets.
|
| 111 |
+
3. Push this directory to the Space via git:
|
| 112 |
+
```bash
|
| 113 |
+
git remote add space https://huggingface.co/spaces/<your-username>/<your-space-name>
|
| 114 |
+
git push space main
|
| 115 |
+
```
|
| 116 |
+
4. The environment URL will naturally respond to ping checks and `/reset` on HF endpoints.
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
### Baseline Scores (Llama-3-8B-Instruct)
|
| 120 |
+
- **Easy Task:** 1.0 (Passed)
|
| 121 |
+
- **Medium Task:** 0.62 (Partial Success - struggled with complex string casting)
|
| 122 |
+
- **Hard Task:** 0.2 (Challenging - requires higher reasoning/longer context)
|