rithwik-ravikumar commited on
Commit
1b718b8
·
verified ·
1 Parent(s): 1314b5a

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. Dockerfile +27 -26
  2. README.md +122 -121
  3. pyproject.toml +24 -24
Dockerfile CHANGED
@@ -1,26 +1,27 @@
1
- FROM python:3.10-slim
2
-
3
- # Creates a non-root user with an explicit UID and GID required by Hugging Face Spaces
4
- RUN groupadd -g 1000 user && \
5
- useradd -m -u 1000 -g 1000 user
6
- USER 1000:1000
7
-
8
- # Set environment variables
9
- ENV HOME=/home/user \
10
- PATH=/home/user/.local/bin:$PATH
11
-
12
- # Create the working directory
13
- WORKDIR $HOME/app
14
-
15
- # Install dependencies first for Docker caching
16
- COPY --chown=user:user requirements.txt .
17
- RUN pip install --no-cache-dir -r requirements.txt
18
-
19
- # Copy the rest of the files
20
- COPY --chown=user:user . .
21
-
22
- # Expose the standard Hugging Face Space port
23
- EXPOSE 7860
24
-
25
- # Start the environment API server
26
- CMD ["uvicorn", "env:app", "--host", "0.0.0.0", "--port", "7860"]
 
 
1
+ FROM python:3.10-slim
2
+
3
+ # Creates a non-root user with an explicit UID and GID required by Hugging Face Spaces
4
+ RUN groupadd -g 1000 user && \
5
+ useradd -m -u 1000 -g 1000 user
6
+ USER 1000:1000
7
+
8
+ # Set environment variables
9
+ ENV HOME=/home/user \
10
+ PATH=/home/user/.local/bin:$PATH
11
+
12
+ # Create the working directory
13
+ WORKDIR $HOME/app
14
+
15
+ # Install dependencies first for Docker caching
16
+ COPY --chown=user:user requirements.txt .
17
+ RUN pip install --no-cache-dir -r requirements.txt
18
+
19
+ # Copy the rest of the files
20
+ COPY --chown=user:user . .
21
+
22
+ # Expose the standard Hugging Face Space port
23
+ EXPOSE 7860
24
+
25
+ # Start the environment API server
26
+ ENV ENABLE_WEB_INTERFACE=true
27
+ CMD ["uvicorn", "env:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,122 +1,123 @@
1
- ---
2
- title: SQL Data Engineer Environment
3
- emoji: 🗄️
4
- colorFrom: blue
5
- colorTo: indigo
6
- sdk: docker
7
- pinned: false
8
- app_port: 7860
9
- ---
10
- # OpenEnv: SQL Data Engineer Environment
11
-
12
- Welcome to the **SQL Data Engineer Environment**—a robust, fully-compliant baseline environment built for the Meta OpenEnv Hackathon. This project tests an AI agent's ability to natively interact with a live SQL engine to perform pragmatic data extraction, cleansing, and complex schema normalization tasks.
13
-
14
- ## Environment Description & Motivation
15
-
16
- ### Why SQL Data Engineering?
17
- The industry needs reliable agents that can act as backend developers, DB administrators, and data engineers. While many environments focus on web browsing or gaming, manipulating relational databases is a high-value, real-world task. This environment simulates authentic obstacles developers face:
18
- - Analyzing undocumented database schemas.
19
- - Cleansing noisy string data into strict scalar types.
20
- - Restructuring and normalizing flat tables into relational architectures while rigorously preserving foreign-key constraints.
21
-
22
- It presents an excellent metric to gauge an LLM's structured reasoning and precise SQL generation capabilities.
23
-
24
- ---
25
-
26
- ## Space Definitions
27
-
28
- The environment adheres strictly to the OpenEnv Pydantic specification, enabling seamless API integration.
29
-
30
- ### Observation Space
31
- The observation space is tailored to provide dense context while remaining token-efficient:
32
- - `goal` (string): The explicit task prompt/requirement dictating what the agent must achieve.
33
- - `schema_dump` (string | null): The current DDL representing all tables and views in the DB (schema definition). Sent back only when the schema dynamically changes or the state is stable.
34
- - `result` (string): The standard output of the previously executed query (capped to 10 rows for SELECTs) or a clear `rowcount` confirmation for INSERTs/UPDATEs.
35
- - `last_action_error` (boolean): Flag indicating if the previous SQL Action threw a syntax or logic engine error.
36
- - `step` (integer): The current episode step tally.
37
-
38
- ### Action Space
39
- - `action_str` (string): The agent must return a JSON dictionary containing a single, syntactically correct SQLite query to be executed against the backend state.
40
-
41
- ---
42
-
43
- ## Tasks & Graders
44
-
45
- Each episode challenges the agent with one of 3 tasks featuring deterministic OpenEnv graders scoring between `0.0` and `1.0`.
46
-
47
- #### 1. Easy: Data Extraction (View Creation)
48
- - **Goal**: Read a `customers` table, filter out metrics > 1000.0, and construct a targeted `high_value_customers` SQL View.
49
- - **Difficulty**: Easy. Tests basic SELECT syntax and DDL proficiency.
50
- - **Grader**: Validates if the correct view exists in `sqlite_master`, assigns `0.5` points. Exact row and content matching grants the remaining `+0.5`.
51
-
52
- #### 2. Medium: Data Cleaning
53
- - **Goal**: Coerce a messy `products` table. The agent must standardize categorical string sizes (e.g., converting 'ELEC' to 'ELECTRONICS') and extract numeric floats from dirty string pricing (e.g., '$85.00' -> `85.0`) into a new generated float column.
54
- - **Difficulty**: Medium. Tests native string pattern matching and targeted UPDATE pipelines.
55
- - **Grader**: Adding the column yields `0.3` points. Correct categorical string mapping grants up to `0.3`, and correctly extracted float prices yield `0.4` respectively.
56
-
57
- #### 3. Hard: Schema Normalization
58
- - **Goal**: Normalize a completely flat `hospital_records` repository into a structured 3-table format (`patients`, `doctors`, `appointments`). Data must be completely migrated and bound by Primary/Foreign key constraints.
59
- - **Difficulty**: Hard. Tests multi-step schema architectural reasoning and safe data-migration pipelines.
60
- - **Grader**: Validating table signatures issues `0.1` per table. Proper data counts yield `0.1` each, and if a relational JOIN across the new DB perfectly rebuilds the original flat map, the final `0.3` is awarded.
61
-
62
- ### The Dense Reward Function
63
- Scores are completely dense over the episode lifecycle.
64
- At `step(action)`, the grader executes. The mathematical reward signal is continuous:
65
- `Reward = (Current_Score - Previous_Score)`.
66
- *Note: A `-0.05` penalty is actively applied when `last_action_error` triggers, strongly discouraging hallucinated or malformed SQL loops.*
67
-
68
- ---
69
-
70
- ## Local Setup & Usage
71
-
72
- To validate the OpenEnv schema, install the framework, and run the OpenAI-compatible baseline script locally:
73
-
74
- ```bash
75
- # 1. Clone the repository and navigate inside
76
- git clone <your-repo-url>
77
- cd OpenEnv-SQL-Data-Engineer
78
-
79
- # 2. Setup standard Python virtual environment
80
- python -m venv venv
81
- source venv/bin/activate # Or `venv\Scripts\activate` on Windows
82
-
83
- # 3. Install core dependencies (FastAPI, Pydantic, OpenAI, OpenEnv)
84
- pip install openenv openenv-core openai pydantic fastapi uvicorn requests
85
-
86
- # 4. Verify OpenEnv schema compliance locally
87
- openenv validate
88
-
89
- # 5. Execute the baseline AI Agent (make sure to set your key)
90
- export OPENAI_API_KEY="your-api-key"
91
- export MODEL_NAME="gpt-4o"
92
- python inference.py
93
- ```
94
-
95
- ---
96
-
97
- ## Deployment Instructions
98
-
99
- ### Docker Container Build
100
- The environment provides a native Hugging Face structured `Dockerfile` configured to launch on port 7860 as an unprivileged user.
101
-
102
- ```bash
103
- docker build -t openenv-sql .
104
- docker run -p 7860:7860 openenv-sql
105
- ```
106
-
107
- ### Deploying to Hugging Face Spaces
108
- To finalize your Hackathon deployment and spin up the live inference API:
109
- 1. First, create a new minimal **Docker Space** inside Hugging Face.
110
- 2. Ensure you add `HF_TOKEN` globally inside your HF Space Repository secrets.
111
- 3. Push this directory to the Space via git:
112
- ```bash
113
- git remote add space https://huggingface.co/spaces/<your-username>/<your-space-name>
114
- git push space main
115
- ```
116
- 4. The environment URL will naturally respond to ping checks and `/reset` on HF endpoints.
117
-
118
-
119
- ### Baseline Scores (Llama-3-8B-Instruct)
120
- - **Easy Task:** 1.0 (Passed)
121
- - **Medium Task:** 0.62 (Partial Success - struggled with complex string casting)
 
122
  - **Hard Task:** 0.2 (Challenging - requires higher reasoning/longer context)
 
1
+ ---
2
+ title: SQL Data Engineer Environment
3
+ emoji: 🗄️
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ pinned: false
8
+ app_port: 7860
9
+ base_path: /web
10
+ ---
11
+ # OpenEnv: SQL Data Engineer Environment
12
+
13
+ Welcome to the **SQL Data Engineer Environment**—a robust, fully-compliant baseline environment built for the Meta OpenEnv Hackathon. This project tests an AI agent's ability to natively interact with a live SQL engine to perform pragmatic data extraction, cleansing, and complex schema normalization tasks.
14
+
15
+ ## Environment Description & Motivation
16
+
17
+ ### Why SQL Data Engineering?
18
+ The industry needs reliable agents that can act as backend developers, DB administrators, and data engineers. While many environments focus on web browsing or gaming, manipulating relational databases is a high-value, real-world task. This environment simulates authentic obstacles developers face:
19
+ - Analyzing undocumented database schemas.
20
+ - Cleansing noisy string data into strict scalar types.
21
+ - Restructuring and normalizing flat tables into relational architectures while rigorously preserving foreign-key constraints.
22
+
23
+ It presents an excellent metric to gauge an LLM's structured reasoning and precise SQL generation capabilities.
24
+
25
+ ---
26
+
27
+ ## Space Definitions
28
+
29
+ The environment adheres strictly to the OpenEnv Pydantic specification, enabling seamless API integration.
30
+
31
+ ### Observation Space
32
+ The observation space is tailored to provide dense context while remaining token-efficient:
33
+ - `goal` (string): The explicit task prompt/requirement dictating what the agent must achieve.
34
+ - `schema_dump` (string | null): The current DDL representing all tables and views in the DB (schema definition). Sent back only when the schema dynamically changes or the state is stable.
35
+ - `result` (string): The standard output of the previously executed query (capped to 10 rows for SELECTs) or a clear `rowcount` confirmation for INSERTs/UPDATEs.
36
+ - `last_action_error` (boolean): Flag indicating if the previous SQL Action threw a syntax or logic engine error.
37
+ - `step` (integer): The current episode step tally.
38
+
39
+ ### Action Space
40
+ - `action_str` (string): The agent must return a JSON dictionary containing a single, syntactically correct SQLite query to be executed against the backend state.
41
+
42
+ ---
43
+
44
+ ## Tasks & Graders
45
+
46
+ Each episode challenges the agent with one of 3 tasks featuring deterministic OpenEnv graders scoring between `0.0` and `1.0`.
47
+
48
+ #### 1. Easy: Data Extraction (View Creation)
49
+ - **Goal**: Read a `customers` table, filter out metrics > 1000.0, and construct a targeted `high_value_customers` SQL View.
50
+ - **Difficulty**: Easy. Tests basic SELECT syntax and DDL proficiency.
51
+ - **Grader**: Validates if the correct view exists in `sqlite_master`, assigns `0.5` points. Exact row and content matching grants the remaining `+0.5`.
52
+
53
+ #### 2. Medium: Data Cleaning
54
+ - **Goal**: Coerce a messy `products` table. The agent must standardize categorical string sizes (e.g., converting 'ELEC' to 'ELECTRONICS') and extract numeric floats from dirty string pricing (e.g., '$85.00' -> `85.0`) into a new generated float column.
55
+ - **Difficulty**: Medium. Tests native string pattern matching and targeted UPDATE pipelines.
56
+ - **Grader**: Adding the column yields `0.3` points. Correct categorical string mapping grants up to `0.3`, and correctly extracted float prices yield `0.4` respectively.
57
+
58
+ #### 3. Hard: Schema Normalization
59
+ - **Goal**: Normalize a completely flat `hospital_records` repository into a structured 3-table format (`patients`, `doctors`, `appointments`). Data must be completely migrated and bound by Primary/Foreign key constraints.
60
+ - **Difficulty**: Hard. Tests multi-step schema architectural reasoning and safe data-migration pipelines.
61
+ - **Grader**: Validating table signatures issues `0.1` per table. Proper data counts yield `0.1` each, and if a relational JOIN across the new DB perfectly rebuilds the original flat map, the final `0.3` is awarded.
62
+
63
+ ### The Dense Reward Function
64
+ Scores are completely dense over the episode lifecycle.
65
+ At `step(action)`, the grader executes. The mathematical reward signal is continuous:
66
+ `Reward = (Current_Score - Previous_Score)`.
67
+ *Note: A `-0.05` penalty is actively applied when `last_action_error` triggers, strongly discouraging hallucinated or malformed SQL loops.*
68
+
69
+ ---
70
+
71
+ ## Local Setup & Usage
72
+
73
+ To validate the OpenEnv schema, install the framework, and run the OpenAI-compatible baseline script locally:
74
+
75
+ ```bash
76
+ # 1. Clone the repository and navigate inside
77
+ git clone <your-repo-url>
78
+ cd OpenEnv-SQL-Data-Engineer
79
+
80
+ # 2. Setup standard Python virtual environment
81
+ python -m venv venv
82
+ source venv/bin/activate # Or `venv\Scripts\activate` on Windows
83
+
84
+ # 3. Install core dependencies (FastAPI, Pydantic, OpenAI, OpenEnv)
85
+ pip install openenv openenv-core openai pydantic fastapi uvicorn requests
86
+
87
+ # 4. Verify OpenEnv schema compliance locally
88
+ openenv validate
89
+
90
+ # 5. Execute the baseline AI Agent (make sure to set your key)
91
+ export OPENAI_API_KEY="your-api-key"
92
+ export MODEL_NAME="gpt-4o"
93
+ python inference.py
94
+ ```
95
+
96
+ ---
97
+
98
+ ## Deployment Instructions
99
+
100
+ ### Docker Container Build
101
+ The environment provides a native Hugging Face structured `Dockerfile` configured to launch on port 7860 as an unprivileged user.
102
+
103
+ ```bash
104
+ docker build -t openenv-sql .
105
+ docker run -p 7860:7860 openenv-sql
106
+ ```
107
+
108
+ ### Deploying to Hugging Face Spaces
109
+ To finalize your Hackathon deployment and spin up the live inference API:
110
+ 1. First, create a new minimal **Docker Space** inside Hugging Face.
111
+ 2. Ensure you add `HF_TOKEN` globally inside your HF Space Repository secrets.
112
+ 3. Push this directory to the Space via git:
113
+ ```bash
114
+ git remote add space https://huggingface.co/spaces/<your-username>/<your-space-name>
115
+ git push space main
116
+ ```
117
+ 4. The environment URL will naturally respond to ping checks and `/reset` on HF endpoints.
118
+
119
+
120
+ ### Baseline Scores (Llama-3-8B-Instruct)
121
+ - **Easy Task:** 1.0 (Passed)
122
+ - **Medium Task:** 0.62 (Partial Success - struggled with complex string casting)
123
  - **Hard Task:** 0.2 (Challenging - requires higher reasoning/longer context)
pyproject.toml CHANGED
@@ -1,25 +1,25 @@
1
- [build-system]
2
- requires = ["setuptools>=61.0"]
3
- build-backend = "setuptools.build_meta"
4
-
5
- [project]
6
- name = "sql-data-engineer-env"
7
- version = "0.1.0"
8
- description = "A real-world SQL data engineering environment for agent evaluation."
9
- readme = "README.md"
10
- requires-python = ">=3.10"
11
- dependencies = [
12
- "openenv-core",
13
- "fastapi",
14
- "uvicorn",
15
- "pydantic",
16
- "openai",
17
- "pandas"
18
- ]
19
-
20
- [tool.setuptools.packages.find]
21
- where = ["."]
22
- include = ["server*"]
23
-
24
- [project.scripts]
25
  server = "server.app:main"
 
1
+ [build-system]
2
+ requires = ["setuptools>=61.0"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "sql-data-engineer-env"
7
+ version = "0.1.0"
8
+ description = "A real-world SQL data engineering environment for agent evaluation."
9
+ readme = "README.md"
10
+ requires-python = ">=3.10"
11
+ dependencies = [
12
+ "openenv-core",
13
+ "fastapi",
14
+ "uvicorn",
15
+ "pydantic",
16
+ "openai",
17
+ "pandas"
18
+ ]
19
+
20
+ [tool.setuptools.packages.find]
21
+ where = ["."]
22
+ include = ["server*"]
23
+
24
+ [project.scripts]
25
  server = "server.app:main"