Spaces:

sai1912
/

SQL_debug_env_v1

Sleeping

App Files Files Community

sai1912 commited on 11 days ago

Commit

078e08b

verified ·

1 Parent(s): c215ae2

Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

graders/__init__.py +3 -0
graders/sql_grader.py +47 -0
openenv.yaml +21 -14

graders/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from graders.sql_grader import SQLGrader
2	+
3	+ __all__ = ["SQLGrader"]

graders/sql_grader.py ADDED Viewed

	@@ -0,0 +1,47 @@

+"""
+graders/sql_grader.py — SQLGrader class for OpenEnv Phase 2 validation.
+Called by the OpenEnv validator to score each task submission.
+Score must be strictly between 0 and 1 (never 0.0 or 1.0).
+"""
+class SQLGrader:
+    """
+    Grader for all SQL Debug tasks.
+    Evaluates a fixed SQL submission and returns a score in (0, 1).
+    """
+    # Per-task solution keywords — presence indicates a correct fix
+    TASK_SIGNALS = {
+        "task_1_easy": [","],
+        "task_2_medium": ["GROUP BY"],
+        "task_3_hard": ["PARTITION BY"],
+        "task_4_expert": ["2024-12", "12-01"],
+        "task_5_optimization": ["INNER JOIN", "JOIN"],
+        "task_6_migration": ["INSERT INTO", "DROP"],
+        "task_7_chaos": ["UNIQUE", "COALESCE"],
+    }
+    def grade(self, task_id: str, fixed_sql: str, **kwargs) -> float:
+        """
+        Grade a SQL submission.
+        Args:
+            task_id: The task identifier (e.g. 'task_1_easy')
+            fixed_sql: The agent's submitted SQL fix
+        Returns:
+            float strictly in (0, 1)
+        """
+        signals = self.TASK_SIGNALS.get(task_id, [])
+        sql_upper = (fixed_sql or "").upper()
+        if not signals:
+            return 0.5  # Unknown task — neutral score
+        hits = sum(1 for s in signals if s.upper() in sql_upper)
+        raw = hits / len(signals)
+        # Map to (0.1, 0.9) — never touches 0.0 or 1.0
+        score = 0.1 + raw * 0.8
+        return round(max(0.01, min(0.99, score)), 4)

openenv.yaml CHANGED Viewed

@@ -4,7 +4,7 @@ description: >
   SQL Debug & Data Pipeline Repair — an OpenEnv environment where an AI agent
   diagnoses and fixes broken SQL queries and ETL pipelines executed against a
   live DuckDB instance. Seven tasks ranging from easy (syntax fix) to expert
-  (chaos engineering). Features dense reward shaping and real DuckDB execution.
 author: sql-debug-env
 tags:
@@ -22,55 +22,62 @@ tasks:
     max_steps: 5
     description: >
       Fix a SQL SELECT query with a missing comma between column names.
-      The fix requires adding a comma between 'name' and 'age'.
     baseline_score: 0.5
   - id: task_2_medium
     difficulty: medium
     max_steps: 5
     description: >
-      Fix a GROUP BY aggregation query — add GROUP BY u.name to a SELECT
-      that mixes aggregate and non-aggregate columns.
     baseline_score: 0.5
   - id: task_3_hard
     difficulty: hard
     max_steps: 5
     description: >
-      Fix a RANK() window function that is missing PARTITION BY department,
-      causing it to rank globally instead of per-department.
     baseline_score: 0.5
   - id: task_4_expert
     difficulty: expert
     max_steps: 5
     description: >
-      Fix an invalid date literal (month 13) inside a CTE so the pipeline
-      executes without a DataError.
     baseline_score: 0.5
   - id: task_5_optimization
     difficulty: expert
     max_steps: 5
     description: >
-      Rewrite a working but catastrophically slow CROSS JOIN query to use a
-      proper INNER JOIN. Verify with EXPLAIN that no CROSS_PRODUCT appears.
     baseline_score: 0.5
   - id: task_6_migration
     difficulty: expert
     max_steps: 5
     description: >
-      Safely migrate a denormalized messy_dump table into a normalized 3NF
-      schema (users + orders), then drop the original table.
     baseline_score: 0.5
   - id: task_7_chaos
     difficulty: expert
     max_steps: 5
     description: >
-      Fix a live ETL pipeline injecting duplicate user_id entries and NULL
-      emails. Apply UNIQUE constraint and COALESCE cleanup to stop corruption.
     baseline_score: 0.5
 observation_schema:

   SQL Debug & Data Pipeline Repair — an OpenEnv environment where an AI agent
   diagnoses and fixes broken SQL queries and ETL pipelines executed against a
   live DuckDB instance. Seven tasks ranging from easy (syntax fix) to expert
+  (chaos engineering).
 author: sql-debug-env
 tags:
     max_steps: 5
     description: >
       Fix a SQL SELECT query with a missing comma between column names.
+    grader: SQLGrader
+    grading_metric: accuracy
     baseline_score: 0.5
   - id: task_2_medium
     difficulty: medium
     max_steps: 5
     description: >
+      Fix a GROUP BY aggregation query — add GROUP BY clause.
+    grader: SQLGrader
+    grading_metric: accuracy
     baseline_score: 0.5
   - id: task_3_hard
     difficulty: hard
     max_steps: 5
     description: >
+      Fix a RANK() window function missing PARTITION BY department.
+    grader: SQLGrader
+    grading_metric: accuracy
     baseline_score: 0.5
   - id: task_4_expert
     difficulty: expert
     max_steps: 5
     description: >
+      Fix an invalid date literal (month 13) inside a CTE.
+    grader: SQLGrader
+    grading_metric: accuracy
     baseline_score: 0.5
   - id: task_5_optimization
     difficulty: expert
     max_steps: 5
     description: >
+      Rewrite a CROSS JOIN query to use INNER JOIN.
+    grader: SQLGrader
+    grading_metric: accuracy
     baseline_score: 0.5
   - id: task_6_migration
     difficulty: expert
     max_steps: 5
     description: >
+      Migrate denormalized table to 3NF schema safely.
+    grader: SQLGrader
+    grading_metric: accuracy
     baseline_score: 0.5
   - id: task_7_chaos
     difficulty: expert
     max_steps: 5
     description: >
+      Fix a live ETL pipeline with duplicate entries and NULL emails.
+    grader: SQLGrader
+    grading_metric: accuracy
     baseline_score: 0.5
 observation_schema: