Spaces:

Dishaaa25
/

data-cleaning-openenv

Sleeping

App Files Files Community

Dishaaa25 commited on 18 days ago

Commit

dce68a7

verified ·

1 Parent(s): b17178a

Upload folder using huggingface_hub

Browse files

Files changed (27) hide show

Dockerfile +13 -0
README.md +203 -5
TEAMMATE_BASELINE.md +80 -0
__init__.py +6 -0
app.py +3 -0
client.py +34 -0
data/basic_cleaning.json +112 -0
data/full_pipeline.json +817 -0
data/moderate_cleaning.json +364 -0
env/__init__.py +4 -0
env/actions.py +180 -0
env/environment.py +399 -0
env/graders.py +13 -0
env/models.py +40 -0
env/quality.py +68 -0
env/rewards.py +14 -0
inference.py +181 -0
models.py +5 -0
openenv.yaml +29 -0
pyproject.toml +31 -0
requirements.txt +5 -0
server/__init__.py +6 -0
server/app.py +122 -0
server/environment.py +5 -0
server/requirements.txt +5 -0
test_env.py +151 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,13 @@

+FROM python:3.11-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["python", "-m", "uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,208 @@
 ---
-title: Data Cleaning Openenv
-emoji: 💻
-colorFrom: yellow
-colorTo: yellow
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Data Cleaning OpenEnv Environment
+emoji: 🧹
+colorFrom: blue
+colorTo: green
 sdk: docker
 pinned: false
+app_port: 7860
+base_path: /web
+tags:
+  - openenv
 ---
+# Data Cleaning OpenEnv Environment
+## Overview
+This repository contains a real-world OpenEnv benchmark for interactive tabular data cleaning. The agent operates on messy employee-style datasets and must resolve common data preparation issues step by step: missing values, duplicate rows, wrong dtypes, inconsistent categorical values, and derived feature creation.
+The implementation uses plain Python data structures instead of pandas so it stays lightweight for the hackathon constraints, Docker validation, and Hugging Face Spaces deployment.
+The repository now follows the standard OpenEnv layout closely:
+```text
+openenv-data-cleaning/
+├── client.py
+├── models.py
+├── openenv.yaml
+├── pyproject.toml
+├── server/
+│   ├── app.py
+│   ├── environment.py
+│   └── requirements.txt
+└── outputs/
+    ├── evals/
+    └── logs/
+```
+## Environment Summary
+- Domain: tabular data cleaning and preparation
+- Mode: simulation environment with `reset()`, `step()`, and `state()`
+- API: FastAPI on port `7860`
+- Tasks: `basic_cleaning`, `moderate_cleaning`, `full_pipeline`
+- Difficulty curve: easy -> medium -> hard
+## Action Space
+| Action | Target | Required params | Validation rules |
+| --- | --- | --- | --- |
+| `fill_missing` | Specific column | `{"strategy": "mean" \| "median" \| "zero" \| "mode" \| "unknown"}` | Numeric columns allow `mean`, `median`, `zero`; categorical columns allow `mode`, `unknown`. |
+| `drop_duplicates` | `__all__` | `{}` | Only valid when duplicate rows are still present. |
+| `convert_dtype` | Specific column | `{"target_dtype": "int" \| "float" \| "str" \| "bool"}` | Target dtype must match the task configuration and values must be convertible. |
+| `normalize_category` | Categorical column | `{}` | Only valid when case-only category inconsistencies remain. |
+| `create_feature` | Registered feature name | `{"feature_name": "<name>"}` | Feature must be required by the task and its source column must already be clean enough to use. |
+Invalid actions leave the dataset unchanged, emit `{"error": "invalid_action"}` in `info`, consume a step, and return reward `-0.05`.
+## Observation and State Space
+Every `reset()`, `step()`, and `state()` call returns the same typed observation payload:
+| Field | Type | Description |
+| --- | --- | --- |
+| `data_preview` | `list[dict[str, Any]]` | First five rows of the current dataset |
+| `columns` | `list[ColumnInfo]` | Per-column dtype, null count, and unique count |
+| `pending_issues` | `list[Issue]` | Remaining fixable issues |
+| `resolved_issues` | `list[Issue]` | Issues already credited as solved |
+| `action_history` | `list[dict[str, Any]]` | Previous actions with reward and optional error |
+| `quality_score` | `float` | Current quality score in `[0.0, 1.0]` |
+| `steps_remaining` | `int` | Remaining episode budget |
+| `total_rows` | `int` | Current number of rows |
+| `total_issues_at_start` | `int` | Issues detected immediately after `reset()` |
+## Tasks
+| Task | Difficulty | Rows | Main issue profile |
+| --- | --- | --- | --- |
+| `basic_cleaning` | Easy | 20 | Missing `age`, missing `salary` |
+| `moderate_cleaning` | Medium | 50 | Missing `age`, missing `salary`, missing `years_exp`, duplicate rows, wrong `salary` dtype |
+| `full_pipeline` | Hard | 100 | Missing values, duplicate rows, wrong `salary` and `rating` dtypes, inconsistent `city`, inconsistent `department`, required `age_group` feature |
+The hardest task includes explicit dependency chains such as fixing missing salary values before dtype conversion and cleaning source columns before feature creation.
+## Reward and Grading
+Step reward:
+```text
+reward = (new_quality - old_quality) + ordering_bonus - 0.01
+ordering_bonus = 0.05 if dependencies were already satisfied else 0.0
+```
+Dataset quality score combines:
+- Completeness: 40%
+- Uniqueness: 30%
+- Consistency: 30%
+Task grader:
+```text
+correctness = issues_fixed / total_issues
+efficiency = max(0, 1 - steps_taken / (2 * total_issues))
+penalty = wrong_actions * 0.05
+score = 0.8 * correctness + 0.2 * efficiency - penalty
+```
+Grader scores are deterministic, clamped to `[0.0, 1.0]`, and rounded to two decimals.
+## Setup
+### Python and install
+The project requires Python `3.10+`. Python `3.11` is recommended.
+```bash
+python3.11 -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+### Run local checks
+```bash
+python test_env.py
+openenv validate .
+```
+### Run the FastAPI app
+```bash
+uv run server
+```
+Equivalent direct command:
+```bash
+uvicorn server.app:app --host 0.0.0.0 --port 7860
+```
+### Run the baseline inference script
+The hackathon evaluator expects these environment variables:
+```bash
+export HF_TOKEN=...
+export API_BASE_URL=...
+export MODEL_NAME=...
+python inference.py
+```
+The script uses the OpenAI Python client and emits the required `[START]`, `[STEP]`, and `[END]` structured logs.
+### Docker
+```bash
+docker build -t data-cleaning-env .
+docker run -p 7860:7860 data-cleaning-env
+```
+## API Surface
+- `GET /`
+- `GET /health`
+- `GET /metadata`
+- `GET /tasks`
+- `GET /schema`
+- `POST /reset`
+- `POST /step`
+- `GET /state`
+- `POST /mcp`
+## Baseline Scores
+Deterministic scripted benchmark from `test_env.py`:
+- `basic_cleaning`: `0.90`
+- `moderate_cleaning`: `0.90`
+- `full_pipeline`: `0.90`
+Model-based baseline from `inference.py`:
+- `basic_cleaning`: `0.90`
+- `moderate_cleaning`: `0.41`
+- `full_pipeline`: `0.20`
+These scores were produced on April 8, 2026 using `MODEL_NAME=Qwen/Qwen2.5-72B-Instruct` through the configured Hugging Face router. The run completed and emitted the required structured logs, but the provider returned HTTP `402` after the early steps, so the medium and hard tasks were penalized by fallback `parse_error` actions. For a stronger final baseline, top up credits or switch `API_BASE_URL` / `MODEL_NAME` to a provider with available quota and rerun `python inference.py`.
+## Deployment
+### Hugging Face Spaces
+Deploy this repo as a Docker Space tagged with OpenEnv. After deployment, verify:
+- the Space root responds with HTTP `200`
+- `POST /reset` works on the live Space
+- `openenv validate <space-url>` passes runtime validation
+Recommended deploy command:
+```bash
+openenv push --repo-id kaustubhg73/data-cleaning-openenv --exclude .openenv-upload-ignore
+```
+Space link:
+- https://huggingface.co/spaces/kaustubhg73/data-cleaning-openenv

TEAMMATE_BASELINE.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# Teammate Baseline Runbook
+Use this if the current Hugging Face router quota is exhausted and you want to rerun the official baseline with a different token or provider.
+## What is already done
+- The OpenEnv environment is implemented and validated.
+- The Hugging Face Space is live:
+  - `https://huggingface.co/spaces/kaustubhg73/data-cleaning-openenv`
+- Local validation, Docker validation, and live runtime validation already passed.
+## What you need
+Set these environment variables before running `inference.py`:
+```bash
+export HF_TOKEN=...
+export API_BASE_URL=...
+export MODEL_NAME=...
+```
+Important:
+- `inference.py` uses the OpenAI Python client.
+- In this repo, `HF_TOKEN` is the actual API key variable used by the client.
+- A standard Hugging Face router configuration is:
+```bash
+export API_BASE_URL="https://router.huggingface.co/v1"
+export MODEL_NAME="openai/gpt-oss-120b"
+export HF_TOKEN="HF_TOKEN"
+```
+## Local setup
+From the repo root:
+```bash
+python3.11 -m venv .venv311
+source .venv311/bin/activate
+pip install -r requirements.txt
+```
+## Run the baseline
+```bash
+python inference.py
+```
+The required log format is:
+- `[START]`
+- `[STEP]`
+- `[END]`
+Do not change the log format before submission.
+## Expected follow-up after a successful run
+Update the model-based baseline section in `README.md` with:
+- the final scores for all three tasks
+- the model name used
+- a short note that the run completed successfully
+## Optional validation checks
+```bash
+python test_env.py
+openenv validate .
+openenv validate --url https://kaustubhg73-data-cleaning-openenv.hf.space
+```
+## If you need to redeploy
+Use the exclude file so the local `OpenEnv/` tutorial folder is not uploaded:
+```bash
+openenv push --repo-id kaustubhg73/data-cleaning-openenv --exclude .openenv-upload-ignore
+```

__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""OpenEnv data cleaning environment package exports."""
+from client import DataCleaningEnvClient
+from models import Action, Observation
+__all__ = ["Action", "Observation", "DataCleaningEnvClient"]

app.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from server.app import app, main
2	+
3	+ __all__ = ["app", "main"]

client.py ADDED Viewed

	@@ -0,0 +1,34 @@

+"""Thin client helpers for local development and OpenEnv packaging."""
+from typing import Any
+import httpx
+class DataCleaningEnvClient:
+    """Minimal HTTP client for smoke-testing the environment locally."""
+    def __init__(self, base_url: str = "http://localhost:7860"):
+        self.base_url = base_url.rstrip("/")
+        self._client = httpx.Client(base_url=self.base_url, timeout=30.0)
+    def close(self) -> None:
+        self._client.close()
+    def reset(self, task_name: str = "basic_cleaning") -> dict[str, Any]:
+        response = self._client.post("/reset", json={"task_name": task_name})
+        response.raise_for_status()
+        return response.json()
+    def step(self, payload: dict[str, Any]) -> dict[str, Any]:
+        response = self._client.post("/step", json=payload)
+        response.raise_for_status()
+        return response.json()
+    def state(self) -> dict[str, Any]:
+        response = self._client.get("/state")
+        response.raise_for_status()
+        return response.json()
+__all__ = ["DataCleaningEnvClient"]

data/basic_cleaning.json ADDED Viewed

	@@ -0,0 +1,112 @@

+{
+  "task_name": "basic_cleaning",
+  "max_steps": 6,
+  "expected_dtypes": {
+    "age": "int",
+    "salary": "int",
+    "city": "str"
+  },
+  "required_features": [],
+  "dataset": [
+    {
+      "age": 25,
+      "salary": 50000,
+      "city": "Mumbai"
+    },
+    {
+      "age": null,
+      "salary": 60000,
+      "city": "Delhi"
+    },
+    {
+      "age": 30,
+      "salary": null,
+      "city": "Mumbai"
+    },
+    {
+      "age": 22,
+      "salary": 45000,
+      "city": "Bangalore"
+    },
+    {
+      "age": null,
+      "salary": 55000,
+      "city": "Delhi"
+    },
+    {
+      "age": 28,
+      "salary": 70000,
+      "city": "Mumbai"
+    },
+    {
+      "age": 35,
+      "salary": null,
+      "city": "Bangalore"
+    },
+    {
+      "age": null,
+      "salary": 48000,
+      "city": "Delhi"
+    },
+    {
+      "age": 26,
+      "salary": 52000,
+      "city": "Mumbai"
+    },
+    {
+      "age": 31,
+      "salary": null,
+      "city": "Bangalore"
+    },
+    {
+      "age": 29,
+      "salary": 62000,
+      "city": "Delhi"
+    },
+    {
+      "age": null,
+      "salary": 43000,
+      "city": "Mumbai"
+    },
+    {
+      "age": 24,
+      "salary": 51000,
+      "city": "Bangalore"
+    },
+    {
+      "age": 33,
+      "salary": null,
+      "city": "Delhi"
+    },
+    {
+      "age": 27,
+      "salary": 58000,
+      "city": "Mumbai"
+    },
+    {
+      "age": null,
+      "salary": 47000,
+      "city": "Bangalore"
+    },
+    {
+      "age": 32,
+      "salary": 65000,
+      "city": "Delhi"
+    },
+    {
+      "age": 23,
+      "salary": null,
+      "city": "Mumbai"
+    },
+    {
+      "age": 36,
+      "salary": 72000,
+      "city": "Bangalore"
+    },
+    {
+      "age": 28,
+      "salary": 53000,
+      "city": "Delhi"
+    }
+  ]
+}

data/full_pipeline.json ADDED Viewed

	@@ -0,0 +1,817 @@

+{
+  "task_name": "full_pipeline",
+  "max_steps": 15,
+  "expected_dtypes": {
+    "age": "int",
+    "salary": "int",
+    "city": "str",
+    "department": "str",
+    "years_exp": "int",
+    "rating": "float"
+  },
+  "required_features": [
+    "age_group"
+  ],
+  "dataset": [
+    {
+      "age": 24,
+      "salary": "42000",
+      "city": "mumbai",
+      "department": "engineering",
+      "years_exp": 2,
+      "rating": "3.6"
+    },
+    {
+      "age": 27,
+      "salary": "43600",
+      "city": "Delhi",
+      "department": "ENGINEERING",
+      "years_exp": 4,
+      "rating": "3.9"
+    },
+    {
+      "age": 30,
+      "salary": "not_available",
+      "city": "Bangalore",
+      "department": "Sales",
+      "years_exp": 6,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "46800",
+      "city": "pune",
+      "department": "sales",
+      "years_exp": 8,
+      "rating": "4.5"
+    },
+    {
+      "age": 36,
+      "salary": "47400",
+      "city": "Chennai",
+      "department": "MARKETING",
+      "years_exp": 10,
+      "rating": "3.6"
+    },
+    {
+      "age": 39,
+      "salary": "49000",
+      "city": "Hyderabad",
+      "department": "Marketing",
+      "years_exp": null,
+      "rating": "3.9"
+    },
+    {
+      "age": 42,
+      "salary": "50600",
+      "city": "Mumbai",
+      "department": "finance",
+      "years_exp": 14,
+      "rating": "4.2"
+    },
+    {
+      "age": 24,
+      "salary": "52200",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 16,
+      "rating": null
+    },
+    {
+      "age": 27,
+      "salary": "52800",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 18,
+      "rating": "3.6"
+    },
+    {
+      "age": 30,
+      "salary": "54400",
+      "city": "Pune",
+      "department": "support",
+      "years_exp": 3,
+      "rating": "3.9"
+    },
+    {
+      "age": 33,
+      "salary": "56000",
+      "city": "chennai",
+      "department": "OPERATIONS",
+      "years_exp": 5,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "57600",
+      "city": "HYDERABAD",
+      "department": "Operations",
+      "years_exp": 7,
+      "rating": "4.5"
+    },
+    {
+      "age": 39,
+      "salary": "58200",
+      "city": "Mumbai",
+      "department": "engineering",
+      "years_exp": 9,
+      "rating": "3.6"
+    },
+    {
+      "age": 42,
+      "salary": "59800",
+      "city": "delhi",
+      "department": "ENGINEERING",
+      "years_exp": 11,
+      "rating": "3.9"
+    },
+    {
+      "age": 24,
+      "salary": "61400",
+      "city": "BANGALORE",
+      "department": "Sales",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 27,
+      "salary": "63000",
+      "city": "Pune",
+      "department": "sales",
+      "years_exp": 15,
+      "rating": "4.5"
+    },
+    {
+      "age": 30,
+      "salary": "not_available",
+      "city": "chennai",
+      "department": "MARKETING",
+      "years_exp": 17,
+      "rating": "3.6"
+    },
+    {
+      "age": 33,
+      "salary": "65200",
+      "city": "HYDERABAD",
+      "department": "Marketing",
+      "years_exp": 2,
+      "rating": "3.9"
+    },
+    {
+      "age": 36,
+      "salary": "66800",
+      "city": "Mumbai",
+      "department": "finance",
+      "years_exp": 4,
+      "rating": null
+    },
+    {
+      "age": null,
+      "salary": "68400",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 6,
+      "rating": "4.5"
+    },
+    {
+      "age": 42,
+      "salary": "69000",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 8,
+      "rating": "3.6"
+    },
+    {
+      "age": 24,
+      "salary": "70600",
+      "city": "Pune",
+      "department": "support",
+      "years_exp": 10,
+      "rating": "3.9"
+    },
+    {
+      "age": 27,
+      "salary": "72200",
+      "city": "chennai",
+      "department": "OPERATIONS",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 30,
+      "salary": "73800",
+      "city": "HYDERABAD",
+      "department": "Operations",
+      "years_exp": 14,
+      "rating": "4.5"
+    },
+    {
+      "age": 33,
+      "salary": "not_available",
+      "city": "Mumbai",
+      "department": "engineering",
+      "years_exp": 16,
+      "rating": "3.6"
+    },
+    {
+      "age": 36,
+      "salary": "76000",
+      "city": "delhi",
+      "department": "ENGINEERING",
+      "years_exp": 18,
+      "rating": "3.9"
+    },
+    {
+      "age": 39,
+      "salary": "77600",
+      "city": "BANGALORE",
+      "department": "Sales",
+      "years_exp": 3,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "79200",
+      "city": "Pune",
+      "department": "sales",
+      "years_exp": 5,
+      "rating": "4.5"
+    },
+    {
+      "age": 24,
+      "salary": "79800",
+      "city": "chennai",
+      "department": "MARKETING",
+      "years_exp": 7,
+      "rating": "3.6"
+    },
+    {
+      "age": 27,
+      "salary": "81400",
+      "city": "HYDERABAD",
+      "department": "Marketing",
+      "years_exp": 9,
+      "rating": null
+    },
+    {
+      "age": 31,
+      "salary": "83000",
+      "city": "Mumbai",
+      "department": "finance",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 34,
+      "salary": "84600",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 13,
+      "rating": "4.5"
+    },
+    {
+      "age": 37,
+      "salary": "85200",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 15,
+      "rating": "3.6"
+    },
+    {
+      "age": 40,
+      "salary": "not_available",
+      "city": "Pune",
+      "department": "support",
+      "years_exp": 17,
+      "rating": "3.9"
+    },
+    {
+      "age": 43,
+      "salary": "88400",
+      "city": "chennai",
+      "department": "OPERATIONS",
+      "years_exp": 2,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "90000",
+      "city": "HYDERABAD",
+      "department": "Operations",
+      "years_exp": 4,
+      "rating": "4.5"
+    },
+    {
+      "age": 28,
+      "salary": "90600",
+      "city": "Mumbai",
+      "department": "engineering",
+      "years_exp": 6,
+      "rating": "3.6"
+    },
+    {
+      "age": 31,
+      "salary": "92200",
+      "city": "delhi",
+      "department": "ENGINEERING",
+      "years_exp": 8,
+      "rating": "3.9"
+    },
+    {
+      "age": 34,
+      "salary": "93800",
+      "city": "BANGALORE",
+      "department": "Sales",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 37,
+      "salary": "95400",
+      "city": "Pune",
+      "department": "sales",
+      "years_exp": 12,
+      "rating": "4.5"
+    },
+    {
+      "age": 40,
+      "salary": "96000",
+      "city": "chennai",
+      "department": "MARKETING",
+      "years_exp": 14,
+      "rating": null
+    },
+    {
+      "age": 43,
+      "salary": "97600",
+      "city": "HYDERABAD",
+      "department": "Marketing",
+      "years_exp": 16,
+      "rating": "3.9"
+    },
+    {
+      "age": 25,
+      "salary": "99200",
+      "city": "Mumbai",
+      "department": "finance",
+      "years_exp": 18,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "100800",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 3,
+      "rating": "4.5"
+    },
+    {
+      "age": 31,
+      "salary": "101400",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 5,
+      "rating": "3.6"
+    },
+    {
+      "age": 34,
+      "salary": "103000",
+      "city": "Pune",
+      "department": "support",
+      "years_exp": 7,
+      "rating": "3.9"
+    },
+    {
+      "age": 37,
+      "salary": "104600",
+      "city": "chennai",
+      "department": "OPERATIONS",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 40,
+      "salary": "106200",
+      "city": "HYDERABAD",
+      "department": "Operations",
+      "years_exp": 11,
+      "rating": "4.5"
+    },
+    {
+      "age": 43,
+      "salary": "not_available",
+      "city": "Mumbai",
+      "department": "engineering",
+      "years_exp": 13,
+      "rating": "3.6"
+    },
+    {
+      "age": 25,
+      "salary": "108400",
+      "city": "delhi",
+      "department": "ENGINEERING",
+      "years_exp": 15,
+      "rating": "3.9"
+    },
+    {
+      "age": 28,
+      "salary": "110000",
+      "city": "BANGALORE",
+      "department": "Sales",
+      "years_exp": 17,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "111600",
+      "city": "Pune",
+      "department": "sales",
+      "years_exp": 2,
+      "rating": "4.5"
+    },
+    {
+      "age": 34,
+      "salary": "112200",
+      "city": "chennai",
+      "department": "MARKETING",
+      "years_exp": 4,
+      "rating": null
+    },
+    {
+      "age": 37,
+      "salary": "113800",
+      "city": "HYDERABAD",
+      "department": "Marketing",
+      "years_exp": 6,
+      "rating": "3.9"
+    },
+    {
+      "age": 40,
+      "salary": "115400",
+      "city": "Mumbai",
+      "department": "finance",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 43,
+      "salary": "117000",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 10,
+      "rating": "4.5"
+    },
+    {
+      "age": 25,
+      "salary": "117600",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 12,
+      "rating": "3.6"
+    },
+    {
+      "age": 28,
+      "salary": "not_available",
+      "city": "Pune",
+      "department": "support",
+      "years_exp": 14,
+      "rating": "3.9"
+    },
+    {
+      "age": 31,
+      "salary": "120800",
+      "city": "chennai",
+      "department": "OPERATIONS",
+      "years_exp": 16,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "122400",
+      "city": "HYDERABAD",
+      "department": "Operations",
+      "years_exp": 18,
+      "rating": "4.5"
+    },
+    {
+      "age": 38,
+      "salary": "123000",
+      "city": "Mumbai",
+      "department": "engineering",
+      "years_exp": 3,
+      "rating": "3.6"
+    },
+    {
+      "age": 41,
+      "salary": "124600",
+      "city": "delhi",
+      "department": "ENGINEERING",
+      "years_exp": 5,
+      "rating": "3.9"
+    },
+    {
+      "age": 44,
+      "salary": "126200",
+      "city": "BANGALORE",
+      "department": "Sales",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 26,
+      "salary": "127800",
+      "city": "Pune",
+      "department": "sales",
+      "years_exp": 9,
+      "rating": "4.5"
+    },
+    {
+      "age": 29,
+      "salary": "128400",
+      "city": "chennai",
+      "department": "MARKETING",
+      "years_exp": 11,
+      "rating": null
+    },
+    {
+      "age": 32,
+      "salary": "130000",
+      "city": "HYDERABAD",
+      "department": "Marketing",
+      "years_exp": 13,
+      "rating": "3.9"
+    },
+    {
+      "age": 35,
+      "salary": "131600",
+      "city": "Mumbai",
+      "department": "finance",
+      "years_exp": 15,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "133200",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 17,
+      "rating": "4.5"
+    },
+    {
+      "age": 41,
+      "salary": "133800",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 2,
+      "rating": "3.6"
+    },
+    {
+      "age": 44,
+      "salary": "not_available",
+      "city": "Pune",
+      "department": "support",
+      "years_exp": 4,
+      "rating": "3.9"
+    },
+    {
+      "age": 26,
+      "salary": "137000",
+      "city": "chennai",
+      "department": "OPERATIONS",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 29,
+      "salary": "138600",
+      "city": "HYDERABAD",
+      "department": "Operations",
+      "years_exp": 8,
+      "rating": "4.5"
+    },
+    {
+      "age": 32,
+      "salary": "139200",
+      "city": "Mumbai",
+      "department": "engineering",
+      "years_exp": 10,
+      "rating": "3.6"
+    },
+    {
+      "age": 35,
+      "salary": "140800",
+      "city": "delhi",
+      "department": "ENGINEERING",
+      "years_exp": 12,
+      "rating": "3.9"
+    },
+    {
+      "age": 38,
+      "salary": "142400",
+      "city": "BANGALORE",
+      "department": "Sales",
+      "years_exp": 14,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "144000",
+      "city": "Pune",
+      "department": "sales",
+      "years_exp": 16,
+      "rating": "4.5"
+    },
+    {
+      "age": 44,
+      "salary": "144600",
+      "city": "chennai",
+      "department": "MARKETING",
+      "years_exp": 18,
+      "rating": "3.6"
+    },
+    {
+      "age": 26,
+      "salary": "146200",
+      "city": "HYDERABAD",
+      "department": "Marketing",
+      "years_exp": 3,
+      "rating": "3.9"
+    },
+    {
+      "age": 29,
+      "salary": "147800",
+      "city": "Mumbai",
+      "department": "finance",
+      "years_exp": 5,
+      "rating": null
+    },
+    {
+      "age": 32,
+      "salary": "149400",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 7,
+      "rating": "4.5"
+    },
+    {
+      "age": 35,
+      "salary": "150000",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 9,
+      "rating": "3.6"
+    },
+    {
+      "age": 38,
+      "salary": "not_available",
+      "city": "Pune",
+      "department": "support",
+      "years_exp": 11,
+      "rating": "3.9"
+    },
+    {
+      "age": 41,
+      "salary": "153200",
+      "city": "chennai",
+      "department": "OPERATIONS",
+      "years_exp": 13,
+      "rating": "4.2"
+    },
+    {
+      "age": null,
+      "salary": "154800",
+      "city": "HYDERABAD",
+      "department": "Operations",
+      "years_exp": 15,
+      "rating": "4.5"
+    },
+    {
+      "age": 26,
+      "salary": "155400",
+      "city": "Mumbai",
+      "department": "engineering",
+      "years_exp": 17,
+      "rating": "3.6"
+    },
+    {
+      "age": 29,
+      "salary": "157000",
+      "city": "delhi",
+      "department": "ENGINEERING",
+      "years_exp": 2,
+      "rating": "3.9"
+    },
+    {
+      "age": 32,
+      "salary": "158600",
+      "city": "BANGALORE",
+      "department": "Sales",
+      "years_exp": null,
+      "rating": "4.2"
+    },
+    {
+      "age": 35,
+      "salary": "160200",
+      "city": "Pune",
+      "department": "sales",
+      "years_exp": 6,
+      "rating": "4.5"
+    },
+    {
+      "age": 38,
+      "salary": "160800",
+      "city": "chennai",
+      "department": "MARKETING",
+      "years_exp": 8,
+      "rating": "3.6"
+    },
+    {
+      "age": null,
+      "salary": "162400",
+      "city": "HYDERABAD",
+      "department": "Marketing",
+      "years_exp": 10,
+      "rating": "3.9"
+    },
+    {
+      "age": 45,
+      "salary": "164000",
+      "city": "Mumbai",
+      "department": "finance",
+      "years_exp": 12,
+      "rating": null
+    },
+    {
+      "age": 27,
+      "salary": "165600",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 14,
+      "rating": "4.5"
+    },
+    {
+      "age": 36,
+      "salary": "47400",
+      "city": "chennai",
+      "department": "MARKETING",
+      "years_exp": 10,
+      "rating": "3.6"
+    },
+    {
+      "age": 42,
+      "salary": "59800",
+      "city": "delhi",
+      "department": "ENGINEERING",
+      "years_exp": 11,
+      "rating": "3.9"
+    },
+    {
+      "age": 24,
+      "salary": "70600",
+      "city": "Pune",
+      "department": "support",
+      "years_exp": 10,
+      "rating": "3.9"
+    },
+    {
+      "age": 34,
+      "salary": "84600",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 13,
+      "rating": "4.5"
+    },
+    {
+      "age": 31,
+      "salary": "101400",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 5,
+      "rating": "3.6"
+    },
+    {
+      "age": 43,
+      "salary": "117000",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 10,
+      "rating": "4.5"
+    },
+    {
+      "age": 41,
+      "salary": "133800",
+      "city": "BANGALORE",
+      "department": "Support",
+      "years_exp": 2,
+      "rating": "3.6"
+    },
+    {
+      "age": 32,
+      "salary": "149400",
+      "city": "delhi",
+      "department": "FINANCE",
+      "years_exp": 7,
+      "rating": "4.5"
+    }
+  ]
+}

data/moderate_cleaning.json ADDED Viewed

	@@ -0,0 +1,364 @@

+{
+  "task_name": "moderate_cleaning",
+  "max_steps": 10,
+  "expected_dtypes": {
+    "age": "int",
+    "salary": "int",
+    "city": "str",
+    "department": "str",
+    "years_exp": "int"
+  },
+  "required_features": [],
+  "dataset": [
+    {
+      "age": 25,
+      "salary": "50000",
+      "city": "Mumbai",
+      "department": "Engineering",
+      "years_exp": 3
+    },
+    {
+      "age": null,
+      "salary": "62000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 7
+    },
+    {
+      "age": 29,
+      "salary": "54000",
+      "city": "Bangalore",
+      "department": "Marketing",
+      "years_exp": null
+    },
+    {
+      "age": 41,
+      "salary": "not_available",
+      "city": "Pune",
+      "department": "Finance",
+      "years_exp": 14
+    },
+    {
+      "age": 27,
+      "salary": "47000",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 4
+    },
+    {
+      "age": 36,
+      "salary": "73000",
+      "city": "Hyderabad",
+      "department": "Operations",
+      "years_exp": 10
+    },
+    {
+      "age": 30,
+      "salary": "56000",
+      "city": "Mumbai",
+      "department": "Engineering",
+      "years_exp": 6
+    },
+    {
+      "age": null,
+      "salary": "68000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 9
+    },
+    {
+      "age": 26,
+      "salary": "45000",
+      "city": "Bangalore",
+      "department": "Marketing",
+      "years_exp": 3
+    },
+    {
+      "age": 38,
+      "salary": "79000",
+      "city": "Pune",
+      "department": "Finance",
+      "years_exp": null
+    },
+    {
+      "age": 31,
+      "salary": "not_available",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 8
+    },
+    {
+      "age": 28,
+      "salary": "52000",
+      "city": "Hyderabad",
+      "department": "Operations",
+      "years_exp": 5
+    },
+    {
+      "age": 44,
+      "salary": "91000",
+      "city": "Mumbai",
+      "department": "Engineering",
+      "years_exp": 17
+    },
+    {
+      "age": 33,
+      "salary": "66000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 9
+    },
+    {
+      "age": null,
+      "salary": "43000",
+      "city": "Bangalore",
+      "department": "Marketing",
+      "years_exp": 2
+    },
+    {
+      "age": 39,
+      "salary": "82000",
+      "city": "Pune",
+      "department": "Finance",
+      "years_exp": null
+    },
+    {
+      "age": 35,
+      "salary": "71000",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 11
+    },
+    {
+      "age": 29,
+      "salary": "55000",
+      "city": "Hyderabad",
+      "department": "Operations",
+      "years_exp": 6
+    },
+    {
+      "age": 42,
+      "salary": "not_available",
+      "city": "Mumbai",
+      "department": "Engineering",
+      "years_exp": 16
+    },
+    {
+      "age": 37,
+      "salary": "76000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 12
+    },
+    {
+      "age": 27,
+      "salary": "46000",
+      "city": "Bangalore",
+      "department": "Marketing",
+      "years_exp": 4
+    },
+    {
+      "age": 40,
+      "salary": "85000",
+      "city": "Pune",
+      "department": "Finance",
+      "years_exp": 15
+    },
+    {
+      "age": null,
+      "salary": "63000",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 7
+    },
+    {
+      "age": 30,
+      "salary": "58000",
+      "city": "Hyderabad",
+      "department": "Operations",
+      "years_exp": 5
+    },
+    {
+      "age": 45,
+      "salary": "94000",
+      "city": "Mumbai",
+      "department": "Engineering",
+      "years_exp": null
+    },
+    {
+      "age": 34,
+      "salary": "69000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 10
+    },
+    {
+      "age": 25,
+      "salary": "44000",
+      "city": "Bangalore",
+      "department": "Marketing",
+      "years_exp": 3
+    },
+    {
+      "age": 38,
+      "salary": "not_available",
+      "city": "Pune",
+      "department": "Finance",
+      "years_exp": 12
+    },
+    {
+      "age": 31,
+      "salary": "60000",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 8
+    },
+    {
+      "age": 28,
+      "salary": "51000",
+      "city": "Hyderabad",
+      "department": "Operations",
+      "years_exp": 4
+    },
+    {
+      "age": 43,
+      "salary": "92000",
+      "city": "Mumbai",
+      "department": "Engineering",
+      "years_exp": 17
+    },
+    {
+      "age": 36,
+      "salary": "74000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": null
+    },
+    {
+      "age": 26,
+      "salary": "45500",
+      "city": "Bangalore",
+      "department": "Marketing",
+      "years_exp": 3
+    },
+    {
+      "age": 39,
+      "salary": "83000",
+      "city": "Pune",
+      "department": "Finance",
+      "years_exp": 14
+    },
+    {
+      "age": 33,
+      "salary": "not_available",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 9
+    },
+    {
+      "age": null,
+      "salary": "57000",
+      "city": "Hyderabad",
+      "department": "Operations",
+      "years_exp": 6
+    },
+    {
+      "age": 41,
+      "salary": "87000",
+      "city": "Mumbai",
+      "department": "Engineering",
+      "years_exp": 15
+    },
+    {
+      "age": 35,
+      "salary": "72000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 10
+    },
+    {
+      "age": 24,
+      "salary": "42500",
+      "city": "Bangalore",
+      "department": "Marketing",
+      "years_exp": 2
+    },
+    {
+      "age": 37,
+      "salary": "77500",
+      "city": "Pune",
+      "department": "Finance",
+      "years_exp": null
+    },
+    {
+      "age": null,
+      "salary": "59000",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 6
+    },
+    {
+      "age": 27,
+      "salary": "not_available",
+      "city": "Hyderabad",
+      "department": "Operations",
+      "years_exp": 4
+    },
+    {
+      "age": 44,
+      "salary": "93000",
+      "city": "Mumbai",
+      "department": "Engineering",
+      "years_exp": 18
+    },
+    {
+      "age": null,
+      "salary": "64500",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 8
+    },
+    {
+      "age": null,
+      "salary": "53500",
+      "city": "Bangalore",
+      "department": "Marketing",
+      "years_exp": 5
+    },
+    {
+      "age": 27,
+      "salary": "47000",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 4
+    },
+    {
+      "age": 28,
+      "salary": "52000",
+      "city": "Hyderabad",
+      "department": "Operations",
+      "years_exp": 5
+    },
+    {
+      "age": 37,
+      "salary": "76000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 12
+    },
+    {
+      "age": 31,
+      "salary": "60000",
+      "city": "Chennai",
+      "department": "Support",
+      "years_exp": 8
+    },
+    {
+      "age": 35,
+      "salary": "72000",
+      "city": "Delhi",
+      "department": "Sales",
+      "years_exp": 10
+    }
+  ]
+}

env/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from .environment import DataCleaningEnv
+from .graders import DataCleaningGrader
+__all__ = ["DataCleaningEnv", "DataCleaningGrader"]

env/actions.py ADDED Viewed

	@@ -0,0 +1,180 @@

+from __future__ import annotations
+from typing import Any
+from env.models import Action, ColumnInfo, Issue
+ALLOWED_ACTIONS = {
+    "fill_missing",
+    "drop_duplicates",
+    "convert_dtype",
+    "normalize_category",
+    "create_feature",
+}
+VALID_FILL_STRATEGIES = {
+    "numeric": ["mean", "median", "zero"],
+    "categorical": ["mode", "unknown"],
+}
+VALID_TARGET_DTYPES = {"int", "float", "str", "bool"}
+FEATURE_REGISTRY = {
+    "age_group": {
+        "source": "age",
+        "transform": "bin",
+        "bins": [0, 18, 35, 50, 100],
+        "labels": ["young", "adult", "middle", "senior"],
+    },
+    "salary_bracket": {
+        "source": "salary",
+        "transform": "bin",
+        "bins": [0, 25000, 50000, 100000, float("inf")],
+        "labels": ["low", "medium", "high", "very_high"],
+    },
+}
+MISSING_SENTINELS = {None, "", "not_available"}
+def is_missing(value: Any) -> bool:
+    return value in MISSING_SENTINELS
+def infer_column_family(expected_dtype: str) -> str:
+    return "numeric" if expected_dtype in {"int", "float"} else "categorical"
+def has_duplicates(dataset: list[dict[str, Any]]) -> bool:
+    seen: set[tuple[tuple[str, Any], ...]] = set()
+    for row in dataset:
+        key = tuple(sorted(row.items()))
+        if key in seen:
+            return True
+        seen.add(key)
+    return False
+def _get_column_info(column_infos: list[ColumnInfo], column: str) -> ColumnInfo | None:
+    for info in column_infos:
+        if info.name == column:
+            return info
+    return None
+def _non_missing_values(dataset: list[dict[str, Any]], column: str) -> list[Any]:
+    return [row.get(column) for row in dataset if not is_missing(row.get(column))]
+def _is_convertible(value: Any, target_dtype: str) -> bool:
+    if is_missing(value):
+        return True
+    try:
+        if target_dtype == "int":
+            if isinstance(value, bool):
+                return True
+            if isinstance(value, str) and value.strip() == "":
+                return False
+            int(str(value))
+            return True
+        if target_dtype == "float":
+            float(str(value))
+            return True
+        if target_dtype == "bool":
+            normalized = str(value).strip().lower()
+            return normalized in {"true", "false", "1", "0", "yes", "no"}
+        if target_dtype == "str":
+            str(value)
+            return True
+    except (TypeError, ValueError):
+        return False
+    return False
+def validate_action(
+    dataset: list[dict[str, Any]],
+    pending_issues: list[Issue],
+    column_infos: list[ColumnInfo],
+    expected_dtypes: dict[str, str],
+    action: Action,
+    resolved_issues: list[Issue],
+) -> tuple[bool, str, Issue | None, bool]:
+    if action.action_type not in ALLOWED_ACTIONS:
+        return False, f"Unsupported action_type '{action.action_type}'", None, False
+    issue_lookup = {(issue.issue_type, issue.column): issue for issue in pending_issues}
+    column_info = _get_column_info(column_infos, action.column) if action.column != "__all__" else None
+    resolved_ids = {issue.issue_id for issue in resolved_issues}
+    matched_issue: Issue | None = None
+    if action.action_type == "fill_missing":
+        matched_issue = issue_lookup.get(("missing", action.column))
+        if matched_issue is None:
+            return False, f"Column '{action.column}' does not have a pending missing-value issue", None, False
+        if column_info is None:
+            return False, f"Unknown column '{action.column}'", None, False
+        expected_dtype = expected_dtypes.get(action.column, column_info.dtype)
+        family = infer_column_family(expected_dtype)
+        strategy = action.params.get("strategy")
+        if strategy not in VALID_FILL_STRATEGIES[family]:
+            return False, f"Invalid fill strategy '{strategy}' for {family} column", None, False
+        if not any(is_missing(row.get(action.column)) for row in dataset):
+            return False, f"Column '{action.column}' has no missing values", None, False
+    elif action.action_type == "drop_duplicates":
+        matched_issue = issue_lookup.get(("duplicate", "__all__"))
+        if action.column != "__all__":
+            return False, "drop_duplicates must target column '__all__'", None, False
+        if action.params:
+            return False, "drop_duplicates does not accept params", None, False
+        if matched_issue is None or not has_duplicates(dataset):
+            return False, "Dataset does not have duplicate rows", None, False
+    elif action.action_type == "convert_dtype":
+        matched_issue = issue_lookup.get(("wrong_dtype", action.column))
+        if matched_issue is None:
+            return False, f"Column '{action.column}' does not have a pending wrong_dtype issue", None, False
+        target_dtype = action.params.get("target_dtype")
+        if target_dtype not in VALID_TARGET_DTYPES:
+            return False, f"Invalid target dtype '{target_dtype}'", None, False
+        if target_dtype != expected_dtypes.get(action.column):
+            return False, f"Target dtype for '{action.column}' must be '{expected_dtypes.get(action.column)}'", None, False
+        values = _non_missing_values(dataset, action.column)
+        if any(not _is_convertible(value, target_dtype) for value in values):
+            return False, f"Column '{action.column}' contains non-convertible values", None, False
+        if any(str(value).strip().lower() == "not_available" for value in values):
+            return False, f"Column '{action.column}' still contains not_available placeholders", None, False
+    elif action.action_type == "normalize_category":
+        matched_issue = issue_lookup.get(("inconsistent_category", action.column))
+        if matched_issue is None:
+            return False, f"Column '{action.column}' does not have a pending inconsistent_category issue", None, False
+        if action.params:
+            return False, "normalize_category does not accept params", None, False
+        values = [row.get(action.column) for row in dataset if not is_missing(row.get(action.column))]
+        lowered = [str(value).lower() for value in values]
+        if len(lowered) == len(set(lowered)):
+            return False, f"Column '{action.column}' has no categorical inconsistencies", None, False
+    elif action.action_type == "create_feature":
+        matched_issue = issue_lookup.get(("missing_feature", action.column))
+        feature_name = action.params.get("feature_name")
+        if matched_issue is None:
+            return False, f"Column '{action.column}' does not have a pending missing_feature issue", None, False
+        if feature_name not in FEATURE_REGISTRY:
+            return False, f"Unknown feature '{feature_name}'", None, False
+        if action.column != feature_name:
+            return False, f"create_feature column must match feature name '{feature_name}'", None, False
+        source_column = FEATURE_REGISTRY[feature_name]["source"]
+        if source_column not in dataset[0]:
+            return False, f"Source column '{source_column}' is missing", None, False
+        source_dtype = expected_dtypes.get(source_column)
+        if source_dtype not in {"int", "float"}:
+            return False, f"Source column '{source_column}' must be numeric", None, False
+        source_values = _non_missing_values(dataset, source_column)
+        if any(not _is_convertible(value, source_dtype) for value in source_values):
+            return False, f"Source column '{source_column}' is not clean enough to create the feature", None, False
+    dependency_ok = True
+    if matched_issue and matched_issue.depends_on:
+        dependency_ok = all(dep_id in resolved_ids for dep_id in matched_issue.depends_on)
+        if not dependency_ok:
+            return False, f"Dependencies for issue '{matched_issue.issue_id}' are not resolved", matched_issue, False
+    return True, "", matched_issue, dependency_ok

env/environment.py ADDED Viewed

	@@ -0,0 +1,399 @@

+from __future__ import annotations
+import copy
+import json
+from pathlib import Path
+from statistics import median
+from typing import Any
+from env.actions import FEATURE_REGISTRY, is_missing, validate_action
+from env.models import Action, ColumnInfo, Issue, Observation
+from env.quality import compute_quality_score
+from env.rewards import compute_reward
+DATA_DIR = Path(__file__).resolve().parent.parent / "data"
+class DataCleaningEnv:
+    def __init__(self, task_name: str = "basic_cleaning"):
+        self.task_name = task_name
+        self.task_config: dict[str, Any] = {}
+        self.dataset: list[dict[str, Any]] = []
+        self.original_dataset: list[dict[str, Any]] = []
+        self.issues: list[Issue] = []
+        self.pending_issues: list[Issue] = []
+        self.resolved_issues: list[Issue] = []
+        self.action_history: list[dict[str, Any]] = []
+        self.steps_remaining = 0
+        self.max_steps = 0
+        self.total_issues_at_start = 0
+        self.quality_score = 0.0
+        self.expected_dtypes: dict[str, str] = {}
+        self.required_features: list[str] = []
+        self._issue_id_map: dict[tuple[str, str], str] = {}
+    def reset(self) -> Observation:
+        config_path = DATA_DIR / f"{self.task_name}.json"
+        with config_path.open("r", encoding="utf-8") as handle:
+            self.task_config = json.load(handle)
+        self.dataset = copy.deepcopy(self.task_config["dataset"])
+        self.original_dataset = copy.deepcopy(self.dataset)
+        self.expected_dtypes = dict(self.task_config["expected_dtypes"])
+        self.required_features = list(self.task_config.get("required_features", []))
+        self.action_history = []
+        self.resolved_issues = []
+        self.max_steps = int(self.task_config["max_steps"])
+        self.steps_remaining = self.max_steps
+        self._issue_id_map = {}
+        detected = self._detect_issues(self.dataset)
+        self.pending_issues = detected
+        self.issues = list(detected)
+        self.total_issues_at_start = len(detected)
+        self.quality_score = compute_quality_score(
+            self.dataset,
+            self._build_column_infos(),
+            self.total_issues_at_start,
+        )
+        return self.state()
+    def step(self, action: Action) -> tuple[Observation, float, bool, dict]:
+        if not self.dataset:
+            self.reset()
+        self.steps_remaining -= 1
+        old_quality = self.quality_score
+        columns = self._build_column_infos()
+        action_valid, message, matched_issue, dependency_ok = validate_action(
+            self.dataset,
+            self.pending_issues,
+            columns,
+            self.expected_dtypes,
+            action,
+            self.resolved_issues,
+        )
+        info: dict[str, Any] = {}
+        if not action_valid:
+            reward = compute_reward(old_quality, old_quality, False, False)
+            info = {"error": "invalid_action", "message": message}
+            self.action_history.append(
+                {
+                    "action_type": action.action_type,
+                    "column": action.column,
+                    "params": action.params,
+                    "reward": reward,
+                    "error": message,
+                }
+            )
+            observation = self.state()
+            done = self.steps_remaining <= 0 or len(self.pending_issues) == 0
+            return observation, reward, done, info
+        self._apply_action(action)
+        redetected = self._detect_issues(self.dataset)
+        self.pending_issues = redetected
+        self.issues = list(redetected)
+        if matched_issue and not self._issue_present(redetected, matched_issue.issue_type, matched_issue.column):
+            self.resolved_issues.append(matched_issue)
+        self.quality_score = compute_quality_score(
+            self.dataset,
+            self._build_column_infos(),
+            self.total_issues_at_start,
+        )
+        reward = compute_reward(old_quality, self.quality_score, True, dependency_ok)
+        self.action_history.append(
+            {
+                "action_type": action.action_type,
+                "column": action.column,
+                "params": action.params,
+                "reward": reward,
+                "error": None,
+            }
+        )
+        observation = self.state()
+        done = self.steps_remaining <= 0 or len(self.pending_issues) == 0
+        return observation, reward, done, info
+    def state(self) -> Observation:
+        return Observation(
+            data_preview=copy.deepcopy(self.dataset[:5]),
+            columns=self._build_column_infos(),
+            pending_issues=copy.deepcopy(self.pending_issues),
+            resolved_issues=copy.deepcopy(self.resolved_issues),
+            action_history=copy.deepcopy(self.action_history),
+            quality_score=self.quality_score,
+            steps_remaining=self.steps_remaining,
+            total_rows=len(self.dataset),
+            total_issues_at_start=self.total_issues_at_start,
+        )
+    def _detect_issues(self, dataset: list[dict[str, Any]]) -> list[Issue]:
+        if not dataset:
+            return []
+        raw_issues: list[dict[str, Any]] = []
+        columns = list(self.expected_dtypes.keys())
+        for column in columns:
+            missing_count = sum(1 for row in dataset if is_missing(row.get(column)))
+            if missing_count:
+                raw_issues.append(
+                    {
+                        "issue_type": "missing",
+                        "column": column,
+                        "description": f"Column '{column}' has {missing_count} missing values that should be filled.",
+                    }
+                )
+        if self._has_duplicates(dataset):
+            raw_issues.append(
+                {
+                    "issue_type": "duplicate",
+                    "column": "__all__",
+                    "description": "Dataset contains duplicate rows that should be removed.",
+                }
+            )
+        for column in columns:
+            expected_dtype = self.expected_dtypes[column]
+            actual_dtype = self._infer_runtime_dtype(dataset, column)
+            if expected_dtype in {"int", "float", "bool"} and actual_dtype != expected_dtype:
+                raw_issues.append(
+                    {
+                        "issue_type": "wrong_dtype",
+                        "column": column,
+                        "description": (
+                            f"Column '{column}' should be '{expected_dtype}' but is currently represented as '{actual_dtype}'."
+                        ),
+                    }
+                )
+        for column in columns:
+            if self.expected_dtypes[column] != "str":
+                continue
+            if self._has_inconsistent_categories(dataset, column):
+                raw_issues.append(
+                    {
+                        "issue_type": "inconsistent_category",
+                        "column": column,
+                        "description": f"Column '{column}' has inconsistent categorical values that differ only by casing.",
+                    }
+                )
+        for feature_name in self.required_features:
+            if not all(feature_name in row for row in dataset):
+                raw_issues.append(
+                    {
+                        "issue_type": "missing_feature",
+                        "column": feature_name,
+                        "description": f"Required feature '{feature_name}' has not been created yet.",
+                    }
+                )
+        for raw_issue in raw_issues:
+            signature = (raw_issue["issue_type"], raw_issue["column"])
+            if signature not in self._issue_id_map:
+                self._issue_id_map[signature] = f"issue_{len(self._issue_id_map) + 1:03d}"
+        issues: list[Issue] = []
+        signature_to_id = {signature: issue_id for signature, issue_id in self._issue_id_map.items()}
+        for raw_issue in raw_issues:
+            signature = (raw_issue["issue_type"], raw_issue["column"])
+            depends_on: list[str] = []
+            if raw_issue["issue_type"] == "wrong_dtype" and raw_issue["column"] in {"salary", "rating"}:
+                missing_signature = ("missing", raw_issue["column"])
+                if missing_signature in signature_to_id:
+                    depends_on.append(signature_to_id[missing_signature])
+            if raw_issue["issue_type"] == "missing_feature":
+                feature_name = raw_issue["column"]
+                source_column = FEATURE_REGISTRY[feature_name]["source"]
+                for dependency_type in ("missing", "wrong_dtype"):
+                    source_signature = (dependency_type, source_column)
+                    if source_signature in signature_to_id:
+                        depends_on.append(signature_to_id[source_signature])
+            issues.append(
+                Issue(
+                    issue_id=signature_to_id[signature],
+                    issue_type=raw_issue["issue_type"],
+                    column=raw_issue["column"],
+                    description=raw_issue["description"],
+                    depends_on=depends_on,
+                )
+            )
+        return issues
+    def _build_column_infos(self) -> list[ColumnInfo]:
+        if not self.dataset:
+            return []
+        infos: list[ColumnInfo] = []
+        for column in self.dataset[0].keys():
+            values = [row.get(column) for row in self.dataset]
+            non_missing = [value for value in values if not is_missing(value)]
+            infos.append(
+                ColumnInfo(
+                    name=column,
+                    dtype=self._infer_runtime_dtype(self.dataset, column),
+                    null_count=sum(1 for value in values if is_missing(value)),
+                    unique_count=len({str(value) for value in non_missing}),
+                )
+            )
+        return infos
+    def _infer_runtime_dtype(self, dataset: list[dict[str, Any]], column: str) -> str:
+        values = [row.get(column) for row in dataset if not is_missing(row.get(column))]
+        if not values:
+            return self.expected_dtypes.get(column, "str")
+        if all(isinstance(value, bool) for value in values):
+            return "bool"
+        if all(isinstance(value, int) and not isinstance(value, bool) for value in values):
+            return "int"
+        if all(isinstance(value, (int, float)) and not isinstance(value, bool) for value in values):
+            return "float"
+        return "str"
+    def _has_duplicates(self, dataset: list[dict[str, Any]]) -> bool:
+        seen: set[tuple[tuple[str, Any], ...]] = set()
+        for row in dataset:
+            key = tuple(sorted(row.items()))
+            if key in seen:
+                return True
+            seen.add(key)
+        return False
+    def _has_inconsistent_categories(self, dataset: list[dict[str, Any]], column: str) -> bool:
+        groups: dict[str, set[str]] = {}
+        for row in dataset:
+            value = row.get(column)
+            if is_missing(value):
+                continue
+            normalized = str(value).lower()
+            groups.setdefault(normalized, set()).add(str(value))
+        return any(len(forms) > 1 for forms in groups.values())
+    def _issue_present(self, issues: list[Issue], issue_type: str, column: str) -> bool:
+        return any(issue.issue_type == issue_type and issue.column == column for issue in issues)
+    def _apply_action(self, action: Action) -> None:
+        if action.action_type == "fill_missing":
+            self._apply_fill_missing(action.column, action.params["strategy"])
+        elif action.action_type == "drop_duplicates":
+            unique_rows: list[dict[str, Any]] = []
+            seen: set[tuple[tuple[str, Any], ...]] = set()
+            for row in self.dataset:
+                key = tuple(sorted(row.items()))
+                if key in seen:
+                    continue
+                seen.add(key)
+                unique_rows.append(row)
+            self.dataset = unique_rows
+        elif action.action_type == "convert_dtype":
+            target_dtype = action.params["target_dtype"]
+            for row in self.dataset:
+                value = row.get(action.column)
+                if is_missing(value):
+                    row[action.column] = None
+                else:
+                    row[action.column] = self._convert_value(value, target_dtype)
+        elif action.action_type == "normalize_category":
+            self._apply_normalize_category(action.column)
+        elif action.action_type == "create_feature":
+            self._apply_create_feature(action.params["feature_name"])
+    def _apply_fill_missing(self, column: str, strategy: str) -> None:
+        expected_dtype = self.expected_dtypes.get(column, "str")
+        valid_values = [row.get(column) for row in self.dataset if not is_missing(row.get(column))]
+        if expected_dtype in {"int", "float"}:
+            numeric_values = [self._convert_value(value, expected_dtype) for value in valid_values]
+            if strategy == "mean":
+                fill_value = sum(numeric_values) / len(numeric_values)
+            elif strategy == "median":
+                fill_value = median(numeric_values)
+            else:
+                fill_value = 0
+            if expected_dtype == "int":
+                fill_value = int(round(fill_value))
+        else:
+            if strategy == "mode":
+                fill_value = self._pick_mode([str(value) for value in valid_values])
+            else:
+                fill_value = "unknown"
+        for row in self.dataset:
+            if is_missing(row.get(column)):
+                row[column] = fill_value
+    def _apply_normalize_category(self, column: str) -> None:
+        groups: dict[str, dict[str, int]] = {}
+        for row in self.dataset:
+            value = row.get(column)
+            if is_missing(value):
+                continue
+            surface = str(value)
+            groups.setdefault(surface.lower(), {})
+            groups[surface.lower()][surface] = groups[surface.lower()].get(surface, 0) + 1
+        canonical: dict[str, str] = {}
+        for lowered, counts in groups.items():
+            canonical[lowered] = min(
+                counts.items(),
+                key=lambda item: (-item[1], item[0].lower(), 0 if item[0].islower() else 1, item[0]),
+            )[0]
+        for row in self.dataset:
+            value = row.get(column)
+            if is_missing(value):
+                continue
+            row[column] = canonical[str(value).lower()]
+    def _apply_create_feature(self, feature_name: str) -> None:
+        feature_config = FEATURE_REGISTRY[feature_name]
+        source = feature_config["source"]
+        bins = feature_config["bins"]
+        labels = feature_config["labels"]
+        for row in self.dataset:
+            source_value = row.get(source)
+            if is_missing(source_value):
+                row[feature_name] = None
+                continue
+            numeric_value = float(source_value)
+            assigned = None
+            for index, label in enumerate(labels):
+                lower = bins[index]
+                upper = bins[index + 1]
+                is_last = index == len(labels) - 1
+                if (lower <= numeric_value < upper) or (is_last and lower <= numeric_value <= upper):
+                    assigned = label
+                    break
+            row[feature_name] = assigned
+    def _pick_mode(self, values: list[str]) -> str:
+        counts: dict[str, int] = {}
+        for value in values:
+            counts[value] = counts.get(value, 0) + 1
+        return min(
+            counts.items(),
+            key=lambda item: (-item[1], item[0].lower(), 0 if item[0].islower() else 1, item[0]),
+        )[0]
+    def _convert_value(self, value: Any, target_dtype: str) -> Any:
+        if target_dtype == "int":
+            return int(float(str(value)))
+        if target_dtype == "float":
+            return float(str(value))
+        if target_dtype == "bool":
+            normalized = str(value).strip().lower()
+            return normalized in {"true", "1", "yes"}
+        return str(value)

env/graders.py ADDED Viewed

	@@ -0,0 +1,13 @@

+class DataCleaningGrader:
+    def grade(self, final_state: dict, task_config: dict) -> float:
+        issues_fixed = len(final_state["resolved_issues"])
+        total_issues = task_config["total_issues"]
+        steps_taken = task_config["max_steps"] - final_state["steps_remaining"]
+        wrong_actions = sum(1 for action in final_state["action_history"] if action.get("error"))
+        correctness = issues_fixed / total_issues if total_issues > 0 else 1.0
+        efficiency = max(0, 1 - steps_taken / (2 * total_issues)) if total_issues > 0 else 1.0
+        penalty = wrong_actions * 0.05
+        score = 0.8 * correctness + 0.2 * efficiency - penalty
+        return round(max(0.0, min(1.0, score)), 2)

env/models.py ADDED Viewed

	@@ -0,0 +1,40 @@

+from typing import Any
+from pydantic import BaseModel, Field
+class ColumnInfo(BaseModel):
+    name: str
+    dtype: str
+    null_count: int
+    unique_count: int
+class Issue(BaseModel):
+    issue_id: str
+    issue_type: str
+    column: str
+    description: str
+    depends_on: list[str] = Field(default_factory=list)
+class Observation(BaseModel):
+    data_preview: list[dict[str, Any]]
+    columns: list[ColumnInfo]
+    pending_issues: list[Issue]
+    resolved_issues: list[Issue]
+    action_history: list[dict[str, Any]]
+    quality_score: float
+    steps_remaining: int
+    total_rows: int
+    total_issues_at_start: int
+class Action(BaseModel):
+    action_type: str
+    column: str
+    params: dict[str, str] = Field(default_factory=dict)
+class Reward(BaseModel):
+    value: float

env/quality.py ADDED Viewed

	@@ -0,0 +1,68 @@

+from __future__ import annotations
+from typing import Any
+from env.actions import is_missing
+def _is_numeric_value(value: Any, dtype: str) -> bool:
+    if is_missing(value):
+        return False
+    try:
+        if dtype == "int":
+            int(str(value))
+        elif dtype == "float":
+            float(str(value))
+        else:
+            return False
+        return True
+    except (TypeError, ValueError):
+        return False
+def _compute_consistency(dataset: list[dict], column_infos: list) -> float:
+    if not dataset or not column_infos:
+        return 1.0
+    valid_checks = 0
+    total_checks = 0
+    for info in column_infos:
+        values = [row.get(info.name) for row in dataset]
+        if info.dtype in {"int", "float"}:
+            for value in values:
+                total_checks += 1
+                if _is_numeric_value(value, info.dtype):
+                    valid_checks += 1
+        else:
+            non_missing = [str(value) for value in values if not is_missing(value)]
+            if not non_missing:
+                continue
+            lowered = {}
+            for value in non_missing:
+                lowered.setdefault(value.lower(), set()).add(value)
+            has_inconsistency = any(len(forms) > 1 for forms in lowered.values())
+            total_checks += 1
+            if not has_inconsistency:
+                valid_checks += 1
+    return valid_checks / total_checks if total_checks else 1.0
+def compute_quality_score(dataset: list[dict], column_infos: list, original_issues_count: int) -> float:
+    if original_issues_count == 0:
+        return 1.0
+    total_cells = len(dataset) * len(dataset[0]) if dataset else 1
+    missing_cells = sum(
+        1 for row in dataset for value in row.values() if value is None or value == "" or value == "not_available"
+    )
+    completeness = 1.0 - (missing_cells / total_cells)
+    total_rows = len(dataset)
+    unique_rows = len(set(str(sorted(row.items())) for row in dataset))
+    uniqueness = unique_rows / total_rows if total_rows > 0 else 1.0
+    consistency = _compute_consistency(dataset, column_infos)
+    return round(0.4 * completeness + 0.3 * uniqueness + 0.3 * consistency, 4)

env/rewards.py ADDED Viewed

	@@ -0,0 +1,14 @@

+def compute_reward(
+    old_quality: float,
+    new_quality: float,
+    action_valid: bool,
+    resolved_dependency_correctly: bool,
+) -> float:
+    if not action_valid:
+        return -0.05
+    progress = new_quality - old_quality
+    ordering_bonus = 0.05 if resolved_dependency_correctly else 0.0
+    step_cost = -0.01
+    return round(progress + ordering_bonus + step_cost, 4)

inference.py ADDED Viewed

	@@ -0,0 +1,181 @@

+"""
+STDOUT FORMAT (must match exactly):
+[START] task=<task_name> env=data_cleaning_env model=<model_name>
+[STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+[END]   success=<true|false> steps=<n> rewards=<r1,r2,...,rn>
+"""
+import json
+import os
+from openai import OpenAI
+from env.environment import DataCleaningEnv
+from env.graders import DataCleaningGrader
+from env.models import Action
+HF_TOKEN = os.getenv("HF_TOKEN")
+API_BASE_URL = os.getenv("API_BASE_URL")
+MODEL_NAME = os.getenv("MODEL_NAME")
+BENCHMARK = "data_cleaning_env"
+TASKS = ["basic_cleaning", "moderate_cleaning", "full_pipeline"]
+SYSTEM_PROMPT = """You are an AI agent performing data cleaning on a tabular dataset.
+You will receive an observation containing:
+- data_preview: first 5 rows of the current dataset
+- columns: column info (name, dtype, null_count, unique_count)
+- pending_issues: list of issues to fix (each has issue_id, issue_type, column, description, depends_on)
+- resolved_issues: issues already fixed
+- action_history: your previous actions
+- quality_score: current data quality (0.0-1.0)
+- steps_remaining: how many actions you have left
+You must respond with EXACTLY one JSON object representing your action:
+{
+    "action_type": "<one of: fill_missing, drop_duplicates, convert_dtype, normalize_category, create_feature>",
+    "column": "<target column name or __all__ for drop_duplicates>",
+    "params": {<strategy-specific params>}
+}
+Rules:
+- fill_missing: params must have "strategy" key. Use "mean"/"median"/"zero" for numeric columns, "mode"/"unknown" for categorical.
+- drop_duplicates: column = "__all__", params = {}
+- convert_dtype: params must have "target_dtype" key (one of: int, float, str, bool)
+- normalize_category: params = {}
+- create_feature: params must have "feature_name" key (e.g., "age_group")
+IMPORTANT: Fix dependencies first! Check the "depends_on" field of each issue. For example, fill missing string values in a column BEFORE converting its dtype.
+Respond with ONLY the JSON object. No explanation, no markdown, no code blocks."""
+def parse_action(response_text: str) -> Action:
+    text = response_text.strip()
+    if text.startswith("```"):
+        parts = text.split("\n", 1)
+        text = parts[1] if len(parts) > 1 else text[3:]
+        if text.endswith("```"):
+            text = text[:-3]
+        text = text.strip()
+    if text.startswith("json"):
+        text = text[4:].strip()
+    parsed = json.loads(text)
+    return Action(**parsed)
+def require_env(name: str, value: str | None) -> str:
+    if value:
+        return value
+    raise RuntimeError(f"Missing required environment variable: {name}")
+def safe_log_value(value: str | None) -> str:
+    if not value:
+        return "null"
+    return str(value).replace("\n", "_").replace("\r", "_").replace("\t", "_").replace(" ", "_")
+def log_start(task, env, model):
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step, action_str, reward, done, error):
+    error_val = safe_log_value(error)
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={safe_log_value(action_str)} reward={reward:.2f} "
+        f"done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success, steps, score, rewards):
+    rewards_str = ",".join(f"{reward:.2f}" for reward in rewards)
+    success_val = str(success).lower()
+    print(f"[END] success={success_val} steps={steps} score={score:.2f} rewards={rewards_str}", flush=True)
+def run_task(task_name: str):
+    client = OpenAI(
+        base_url=require_env("API_BASE_URL", API_BASE_URL),
+        api_key=require_env("HF_TOKEN", HF_TOKEN),
+    )
+    env = DataCleaningEnv(task_name=task_name)
+    obs = env.reset()
+    log_start(task_name, BENCHMARK, require_env("MODEL_NAME", MODEL_NAME))
+    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
+    rewards_list = []
+    step_count = 0
+    done = False
+    max_possible_steps = obs.steps_remaining
+    task_score = 0.0
+    while not done and step_count < max_possible_steps:
+        obs_dict = obs.model_dump() if hasattr(obs, "model_dump") else obs.dict()
+        messages.append(
+            {
+                "role": "user",
+                "content": f"Current observation:\n{json.dumps(obs_dict, indent=2, default=str)}\n\nChoose your next action.",
+            }
+        )
+        try:
+            response = client.chat.completions.create(
+                model=require_env("MODEL_NAME", MODEL_NAME),
+                messages=messages,
+                temperature=0.3,
+                max_tokens=200,
+            )
+            response_text = response.choices[0].message.content or ""
+            messages.append({"role": "assistant", "content": response_text})
+            action = parse_action(response_text)
+            obs, reward, done, info = env.step(action)
+            step_count += 1
+            last_error = info.get("error")
+            rewards_list.append(reward)
+            action_str = f"{action.action_type}({action.column})"
+            log_step(step_count, action_str, reward, done, last_error)
+        except Exception as exc:
+            step_count += 1
+            rewards_list.append(-0.05)
+            log_step(step_count, "parse_error", -0.05, False, str(exc))
+            messages.append(
+                {
+                    "role": "user",
+                    "content": f"Your response could not be parsed. Error: {str(exc)}. Respond with ONLY a valid JSON action object.",
+                }
+            )
+            if step_count >= max_possible_steps:
+                break
+    success = hasattr(obs, "pending_issues") and len(obs.pending_issues) == 0
+    final_state = obs.model_dump() if hasattr(obs, "model_dump") else obs.dict()
+    task_score = DataCleaningGrader().grade(
+        final_state,
+        {
+            "total_issues": final_state["total_issues_at_start"],
+            "max_steps": max_possible_steps,
+        },
+    )
+    log_end(success, step_count, task_score, rewards_list)
+    return task_score
+def main():
+    require_env("HF_TOKEN", HF_TOKEN)
+    require_env("API_BASE_URL", API_BASE_URL)
+    require_env("MODEL_NAME", MODEL_NAME)
+    scores = {}
+    for task in TASKS:
+        scores[task] = run_task(task)
+    return scores
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Compatibility exports for OpenEnv tooling."""
+from env.models import Action, ColumnInfo, Issue, Observation, Reward
+__all__ = ["Action", "ColumnInfo", "Issue", "Observation", "Reward"]

openenv.yaml ADDED Viewed

	@@ -0,0 +1,29 @@

+spec_version: 1
+name: data_cleaning_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 7860
+description: "RL environment for interactive tabular data cleaning and preparation. Agent must identify and fix data quality issues including missing values, duplicates, wrong dtypes, inconsistent categories, and feature creation."
+version: "1.0.0"
+observation_space:
+  type: dict
+  description: "Contains data_preview, columns, pending_issues, resolved_issues, action_history, quality_score, steps_remaining"
+action_space:
+  type: dict
+  description: "Action with action_type, column, and params fields"
+reward_range: [-0.05, 1.0]
+tasks:
+  - name: basic_cleaning
+    description: "Easy: fill missing values in a small dataset (20 rows, 2 issues)"
+    difficulty: easy
+  - name: moderate_cleaning
+    description: "Medium: handle missing values, duplicates, and wrong dtypes (50 rows, 5 issues in practice)"
+    difficulty: medium
+  - name: full_pipeline
+    description: "Hard: full cleaning pipeline with category normalization and feature creation (100 rows, 10 issues in practice)"
+    difficulty: hard

pyproject.toml ADDED Viewed

	@@ -0,0 +1,31 @@

+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-data-cleaning"
+version = "0.1.0"
+description = "OpenEnv environment for interactive tabular data cleaning."
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv-core>=0.2.0",
+    "fastapi>=0.110.0",
+    "openai>=1.0",
+    "pydantic>=2.0",
+    "uvicorn>=0.30.0",
+]
+[project.optional-dependencies]
+dev = [
+    "httpx>=0.28.0",
+    "pytest>=8.0.0",
+]
+[project.scripts]
+server = "server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["env", "server"]
+py-modules = ["app", "client", "models"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+pydantic>=2.0
+openai>=1.0
+uvicorn
+fastapi
+openenv-core>=0.2.0

server/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Server package for OpenEnv-compatible app entrypoints."""
+from .app import app, main
+from .environment import DataCleaningEnv
+__all__ = ["app", "main", "DataCleaningEnv"]

server/app.py ADDED Viewed

	@@ -0,0 +1,122 @@

+from __future__ import annotations
+import argparse
+from typing import Any, Literal
+import uvicorn
+from fastapi import Body, FastAPI
+from pydantic import BaseModel
+from models import Action, Observation
+from .environment import DataCleaningEnv
+TASKS = ["basic_cleaning", "moderate_cleaning", "full_pipeline"]
+ENV_NAME = "data_cleaning_env"
+ENV_DESCRIPTION = (
+    "RL environment for interactive tabular data cleaning and preparation. "
+    "Agents must fix missing values, duplicates, dtype issues, category inconsistencies, "
+    "and derived-feature requirements."
+)
+app = FastAPI(title="Data Cleaning OpenEnv", version="1.0.0")
+ENV = DataCleaningEnv()
+class ResetRequest(BaseModel):
+    task_name: Literal["basic_cleaning", "moderate_cleaning", "full_pipeline"] = "basic_cleaning"
+def _metadata() -> dict[str, Any]:
+    return {
+        "name": ENV_NAME,
+        "description": ENV_DESCRIPTION,
+        "version": "1.0.0",
+        "tasks": TASKS,
+        "mode": "simulation",
+    }
+@app.get("/")
+def root() -> dict[str, Any]:
+    payload = _metadata()
+    payload["status"] = "ok"
+    return payload
+@app.get("/health")
+def health() -> dict[str, str]:
+    return {"status": "healthy"}
+@app.get("/metadata")
+def metadata() -> dict[str, Any]:
+    return _metadata()
+@app.get("/tasks")
+def list_tasks() -> dict[str, list[str]]:
+    return {"tasks": TASKS}
+@app.get("/schema")
+def schema() -> dict[str, Any]:
+    observation_schema = Observation.model_json_schema()
+    return {
+        "action": Action.model_json_schema(),
+        "observation": observation_schema,
+        "state": observation_schema,
+    }
+@app.post("/mcp")
+def mcp(payload: dict[str, Any] = Body(default_factory=dict)) -> dict[str, Any]:
+    return {
+        "jsonrpc": "2.0",
+        "id": payload.get("id"),
+        "error": {
+            "code": -32601,
+            "message": "MCP methods are not implemented for this benchmark.",
+        },
+    }
+@app.post("/reset")
+def reset(request: ResetRequest | None = None) -> dict[str, Any]:
+    effective_request = request or ResetRequest()
+    ENV.task_name = effective_request.task_name
+    observation = ENV.reset()
+    return observation.model_dump()
+@app.post("/step")
+def step(action: Action) -> dict[str, Any]:
+    observation, reward, done, info = ENV.step(action)
+    return {
+        "observation": observation.model_dump(),
+        "reward": reward,
+        "done": done,
+        "info": info,
+    }
+@app.get("/state")
+def state() -> dict[str, Any]:
+    if not ENV.dataset:
+        ENV.reset()
+    return ENV.state().model_dump()
+def main(host: str | None = None, port: int | None = None) -> None:
+    if host is None or port is None:
+        parser = argparse.ArgumentParser()
+        parser.add_argument("--host", default="0.0.0.0")
+        parser.add_argument("--port", type=int, default=7860)
+        args = parser.parse_args()
+        host = args.host if host is None else host
+        port = args.port if port is None else port
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    main()

server/environment.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Canonical server-side environment entrypoint for the data cleaning benchmark."""
+from env.environment import DataCleaningEnv
+__all__ = ["DataCleaningEnv"]

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+openenv-core>=0.2.0
+fastapi>=0.110.0
+openai>=1.0
+pydantic>=2.0
+uvicorn>=0.30.0

test_env.py ADDED Viewed

	@@ -0,0 +1,151 @@

+import json
+from pathlib import Path
+from fastapi.testclient import TestClient
+from app import app
+from env.environment import DataCleaningEnv
+from env.graders import DataCleaningGrader
+from env.models import Action
+ROOT = Path(__file__).resolve().parent
+def assert_invalid_action_consumes_step() -> None:
+    env = DataCleaningEnv("basic_cleaning")
+    obs = env.reset()
+    _, reward, _, info = env.step(
+        Action(action_type="convert_dtype", column="age", params={"target_dtype": "int"})
+    )
+    assert reward == -0.05
+    assert info["error"] == "invalid_action"
+    assert env.steps_remaining == obs.steps_remaining - 1
+def assert_dependency_gate() -> None:
+    env = DataCleaningEnv("moderate_cleaning")
+    env.reset()
+    _, reward, _, info = env.step(
+        Action(action_type="convert_dtype", column="salary", params={"target_dtype": "int"})
+    )
+    assert reward == -0.05
+    assert info["error"] == "invalid_action"
+def assert_api_contract() -> None:
+    client = TestClient(app)
+    root_response = client.get("/")
+    assert root_response.status_code == 200
+    assert root_response.json()["name"] == "data_cleaning_env"
+    assert client.get("/health").json()["status"] == "healthy"
+    metadata_response = client.get("/metadata")
+    assert metadata_response.status_code == 200
+    metadata_payload = metadata_response.json()
+    assert metadata_payload["name"] == "data_cleaning_env"
+    assert "description" in metadata_payload
+    schema_response = client.get("/schema")
+    assert schema_response.status_code == 200
+    schema_payload = schema_response.json()
+    assert {"action", "observation", "state"} <= set(schema_payload.keys())
+    reset_response = client.post("/reset", json={"task_name": "basic_cleaning"})
+    assert reset_response.status_code == 200
+    assert "pending_issues" in reset_response.json()
+    step_response = client.post(
+        "/step",
+        json={"action_type": "fill_missing", "column": "age", "params": {"strategy": "mean"}},
+    )
+    assert step_response.status_code == 200
+    assert {"observation", "reward", "done", "info"} <= set(step_response.json().keys())
+    state_response = client.get("/state")
+    assert state_response.status_code == 200
+    assert "quality_score" in state_response.json()
+    mcp_response = client.post("/mcp", json={"jsonrpc": "2.0", "id": "smoke"})
+    assert mcp_response.status_code == 200
+    assert mcp_response.json()["jsonrpc"] == "2.0"
+def run_sequence(task_name: str, actions: list[Action], expected_issues: int) -> tuple[dict, float]:
+    env = DataCleaningEnv(task_name)
+    obs = env.reset()
+    assert len(obs.pending_issues) == expected_issues, (task_name, len(obs.pending_issues), expected_issues)
+    initial_quality = obs.quality_score
+    for action in actions:
+        obs, reward, done, info = env.step(action)
+        assert "error" not in info, (task_name, action, info)
+        if done:
+            break
+    assert obs.quality_score >= initial_quality
+    final_state = obs.model_dump()
+    config = json.loads((ROOT / "data" / f"{task_name}.json").read_text(encoding="utf-8"))
+    score = DataCleaningGrader().grade(
+        final_state,
+        {
+            "total_issues": expected_issues,
+            "max_steps": config["max_steps"],
+        },
+    )
+    return final_state, score
+def main() -> None:
+    assert_invalid_action_consumes_step()
+    assert_dependency_gate()
+    assert_api_contract()
+    sequences = {
+        "basic_cleaning": (
+            [
+                Action(action_type="fill_missing", column="age", params={"strategy": "mean"}),
+                Action(action_type="fill_missing", column="salary", params={"strategy": "median"}),
+            ],
+            2,
+        ),
+        "moderate_cleaning": (
+            [
+                Action(action_type="fill_missing", column="age", params={"strategy": "mean"}),
+                Action(action_type="fill_missing", column="years_exp", params={"strategy": "median"}),
+                Action(action_type="fill_missing", column="salary", params={"strategy": "median"}),
+                Action(action_type="convert_dtype", column="salary", params={"target_dtype": "int"}),
+                Action(action_type="drop_duplicates", column="__all__", params={}),
+            ],
+            5,
+        ),
+        "full_pipeline": (
+            [
+                Action(action_type="fill_missing", column="age", params={"strategy": "mean"}),
+                Action(action_type="fill_missing", column="years_exp", params={"strategy": "median"}),
+                Action(action_type="fill_missing", column="rating", params={"strategy": "mean"}),
+                Action(action_type="fill_missing", column="salary", params={"strategy": "median"}),
+                Action(action_type="convert_dtype", column="salary", params={"target_dtype": "int"}),
+                Action(action_type="convert_dtype", column="rating", params={"target_dtype": "float"}),
+                Action(action_type="normalize_category", column="city", params={}),
+                Action(action_type="normalize_category", column="department", params={}),
+                Action(action_type="create_feature", column="age_group", params={"feature_name": "age_group"}),
+                Action(action_type="drop_duplicates", column="__all__", params={}),
+            ],
+            10,
+        ),
+    }
+    for task_name, (actions, expected_issues) in sequences.items():
+        final_state, score = run_sequence(task_name, actions, expected_issues)
+        pending = len(final_state["pending_issues"])
+        resolved = len(final_state["resolved_issues"])
+        print(
+            f"{task_name}: pending={pending} resolved={resolved} "
+            f"steps_remaining={final_state['steps_remaining']} grader_score={score}"
+        )
+if __name__ == "__main__":
+    main()

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff