Spaces:

hissterical
/

openenv2

Configuration error

App Files Files Community

hissterical commited on 9 days ago

Commit

ebf4715

verified ·

1 Parent(s): b136c38

Upload 10 files

Browse files

Files changed (10) hide show

Dockerfile +18 -0
README.md +158 -10
inference.py +221 -0
openenv.yaml +41 -0
requirements.txt +6 -0
server/__init__.py +2 -0
server/data.py +212 -0
server/env.py +409 -0
server/main.py +86 -0
server/models.py +70 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,18 @@

+FROM python:3.11-slim
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+WORKDIR /app
+COPY requirements.txt ./
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+RUN useradd --create-home --uid 1000 appuser
+USER appuser
+EXPOSE 7860
+CMD ["uvicorn", "server.main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,158 @@
----
-title: Openenv2
-emoji: 🚀
-colorFrom: pink
-colorTo: blue
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# ConfigDebuggerEnv
+ConfigDebuggerEnv is a real-world OpenEnv environment for iterative configuration debugging. It simulates tasks that platform engineers and ML engineers face in production: fixing Docker Compose, Kubernetes, and training configuration mistakes under step limits.
+## Why this environment
+Configuration bugs are expensive and common in real systems. They are often partially valid YAML but semantically wrong (type mismatches, missing units, interdependent constraints). This environment provides dense trajectory rewards so an agent can learn corrective behaviors instead of only terminal success/failure.
+## OpenEnv API
+The server exposes the standard lifecycle:
+- POST /reset
+- POST /step
+- GET /state
+### Typed models
+- Action model: ConfigAction
+- Observation model: ConfigObservation
+- Reward model: ConfigReward
+- State model: EnvState
+Models are defined in server/models.py and validated with Pydantic.
+## Action space
+ConfigAction fields:
+- operation: edit | add | delete
+- path: dot path with optional list indexes (example: spec.template.spec.containers.0.image)
+- value: JSON-serializable payload for edit/add
+## Observation space
+ConfigObservation fields:
+- task_id
+- task_description
+- current_config (YAML string)
+- syntax_valid
+- validation_errors
+- schema_score (0.0 to 1.0)
+- logic_score (0.0 to 1.0)
+- overall_score (0.0 to 1.0)
+- step_count
+- max_steps
+## Tasks and graders
+Three deterministic tasks are included:
+1. easy_docker (easy)
+2. medium_k8s (medium)
+3. hard_ml_config (hard)
+Each task has:
+- A broken starting configuration
+- A target configuration
+- Weighted required paths for schema grading
+- Deterministic logic checks
+Grading always returns normalized values in [0.0, 1.0].
+## Reward design
+Reward has dense progression with penalties:
+- Base reward is current overall score
+- Positive delta bonus on improvement
+- Regression penalty on negative delta
+- Loop penalty for repeated states
+- Penalty for invalid actions
+- Penalty for destructive top-level deletes
+- Small completion bonus when solved
+This creates meaningful signals across the full episode, not only at termination.
+## Project structure
+- openenv.yaml
+- Dockerfile
+- requirements.txt
+- inference.py
+- server/
+  - data.py
+  - env.py
+  - main.py
+  - models.py
+## Local setup
+1. Install dependencies
+```bash
+pip install -r requirements.txt
+```
+2. Run server
+```bash
+python -m uvicorn server.main:app --host 0.0.0.0 --port 8000 --reload
+```
+3. Quick API check
+```bash
+curl -X POST "http://localhost:8000/reset" -H "Content-Type: application/json" -d "{\"task_id\":\"easy_docker\"}"
+```
+## Baseline inference
+Heuristic baseline (fully reproducible):
+```bash
+python inference.py --policy heuristic --api-base-url http://localhost:8000 --seed 42
+```
+OpenAI baseline (uses OpenAI Python client and OPENAI_API_KEY):
+```bash
+set OPENAI_API_KEY=your_key_here
+python inference.py --policy openai --model gpt-4o-mini --api-base-url http://localhost:8000 --seed 42
+```
+The script evaluates all three tasks and prints per-task and average scores.
+## Docker
+Build:
+```bash
+docker build -t configdebugger-env .
+```
+Run:
+```bash
+docker run -p 7860:7860 configdebugger-env
+```
+## Hugging Face Spaces notes
+- Use Docker SDK
+- Ensure Space port maps to 7860
+- Add tag: openenv
+- Include environment variables for external evaluation if needed
+## Validation checklist
+- Typed Observation/Action/Reward models: yes
+- reset/step/state implemented: yes
+- 3 tasks with deterministic graders: yes
+- Reward in range [0.0, 1.0] with partial progress: yes
+- Baseline inference script with OpenAI client: yes
+- Dockerfile included: yes
+- OpenEnv metadata file included: yes

inference.py ADDED Viewed

	@@ -0,0 +1,221 @@

+from __future__ import annotations
+import argparse
+import json
+import os
+import random
+from dataclasses import dataclass
+from typing import Any
+import requests
+from openai import OpenAI
+TASKS = ["easy_docker", "medium_k8s", "hard_ml_config"]
+@dataclass
+class EpisodeResult:
+    task_id: str
+    final_score: float
+    done: bool
+    steps: int
+    rewards: list[float]
+def build_openai_client() -> OpenAI:
+    api_key = os.getenv("OPENAI_API_KEY")
+    if not api_key:
+        raise RuntimeError("OPENAI_API_KEY is required for OpenAI baseline mode")
+    return OpenAI(api_key=api_key)
+def extract_json_object(text: str) -> dict[str, Any]:
+    text = text.strip()
+    if "```" in text:
+        blocks = text.split("```")
+        for block in blocks:
+            block = block.strip()
+            if block.startswith("json"):
+                block = block[4:].strip()
+            if block.startswith("{") and block.endswith("}"):
+                return json.loads(block)
+    start = text.find("{")
+    end = text.rfind("}")
+    if start != -1 and end != -1 and end > start:
+        return json.loads(text[start : end + 1])
+    raise ValueError("No JSON object found in model output")
+def choose_heuristic_action(task_id: str, step: int) -> dict[str, Any]:
+    # Deterministic policy for reproducible baseline.
+    easy_plan = [
+        {"operation": "edit", "path": "services.web.image", "value": "nginx:latest"},
+        {"operation": "delete", "path": "services.web.ports.1"},
+        {"operation": "edit", "path": "services.web.environment", "value": {"DEBUG": "true", "API_KEY": "placeholder"}},
+        {"operation": "edit", "path": "services.db.ports.0", "value": "5432:5432"},
+    ]
+    medium_plan = [
+        {"operation": "edit", "path": "metadata.namespace", "value": "default"},
+        {"operation": "edit", "path": "spec.replicas", "value": 3},
+        {"operation": "edit", "path": "spec.template.spec.containers.0.image", "value": "nginx:latest"},
+        {"operation": "edit", "path": "spec.template.spec.containers.0.resources.limits.memory", "value": "512Mi"},
+        {"operation": "edit", "path": "spec.template.spec.containers.0.resources.requests.memory", "value": "256Mi"},
+        {"operation": "edit", "path": "spec.template.spec.containers.0.resources.requests.cpu", "value": "500m"},
+        {"operation": "add", "path": "spec.template.spec.containers.0.ports", "value": [{"containerPort": 80}]},
+    ]
+    hard_plan = [
+        {"operation": "delete", "path": "training.fp16"},
+        {"operation": "edit", "path": "training.batch_size", "value": 16},
+        {"operation": "edit", "path": "training.gradient_accumulation_steps", "value": 2},
+        {"operation": "edit", "path": "training.max_steps", "value": 1000},
+        {"operation": "edit", "path": "training.warmup_steps", "value": 100},
+        {"operation": "edit", "path": "training.optimizer.type", "value": "adamw"},
+        {"operation": "edit", "path": "hardware.gpu_count", "value": 1},
+        {"operation": "edit", "path": "data.train_batch_size", "value": 32},
+        {"operation": "edit", "path": "logging.log_interval", "value": 10},
+    ]
+    plans = {
+        "easy_docker": easy_plan,
+        "medium_k8s": medium_plan,
+        "hard_ml_config": hard_plan,
+    }
+    plan = plans[task_id]
+    return plan[min(step, len(plan) - 1)]
+def choose_openai_action(client: OpenAI, model: str, observation: dict[str, Any]) -> dict[str, Any]:
+    system_prompt = (
+        "You are an environment-control agent for configuration debugging. "
+        "Return exactly one JSON object action."
+    )
+    user_prompt = (
+        "Task:\n"
+        f"{observation['task_description']}\n\n"
+        "Allowed schema:\n"
+        "{\"operation\": \"edit|add|delete\", \"path\": \"dot.path\", \"value\": any|null}\n\n"
+        f"Current score: {observation['overall_score']}\n"
+        f"Validation errors: {observation['validation_errors']}\n"
+        f"Current YAML:\n{observation['current_config']}\n"
+    )
+    response = client.chat.completions.create(
+        model=model,
+        messages=[
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": user_prompt},
+        ],
+        temperature=0,
+        top_p=1,
+        seed=42,
+    )
+    content = response.choices[0].message.content or ""
+    return extract_json_object(content)
+def run_episode(
+    api_base_url: str,
+    task_id: str,
+    max_steps: int,
+    policy: str,
+    model: str,
+    openai_client: OpenAI | None,
+) -> EpisodeResult:
+    reset_resp = requests.post(f"{api_base_url}/reset", json={"task_id": task_id}, timeout=30)
+    reset_resp.raise_for_status()
+    observation = reset_resp.json()["observation"]
+    rewards: list[float] = []
+    done = False
+    print(f"[START] task={task_id} policy={policy}")
+    for step in range(max_steps):
+        if done:
+            break
+        if policy == "heuristic":
+            action = choose_heuristic_action(task_id, step)
+        else:
+            assert openai_client is not None
+            action = choose_openai_action(openai_client, model, observation)
+        step_resp = requests.post(f"{api_base_url}/step", json=action, timeout=30)
+        if step_resp.status_code != 200:
+            rewards.append(0.0)
+            print(f"[STEP] task={task_id} step={step} action=invalid reward=0.00 done=false")
+            continue
+        payload = step_resp.json()
+        observation = payload["observation"]
+        reward = payload["reward"]
+        done = payload["done"]
+        reward_value = float(reward["value"])
+        rewards.append(reward_value)
+        print(
+            f"[STEP] task={task_id} step={step} action={action.get('operation')}:{action.get('path')} "
+            f"reward={reward_value:.3f} score={observation['overall_score']:.3f} done={str(done).lower()}"
+        )
+    result = EpisodeResult(
+        task_id=task_id,
+        final_score=float(observation["overall_score"]),
+        done=done,
+        steps=min(max_steps, len(rewards)),
+        rewards=rewards,
+    )
+    reward_text = ",".join(f"{v:.3f}" for v in rewards)
+    print(
+        f"[END] task={task_id} score={result.final_score:.3f} "
+        f"steps={result.steps} done={str(result.done).lower()} rewards={reward_text}"
+    )
+    return result
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Baseline inference for ConfigDebuggerEnv")
+    parser.add_argument("--api-base-url", default=os.getenv("API_BASE_URL", "http://localhost:8000"))
+    parser.add_argument("--max-steps", type=int, default=12)
+    parser.add_argument("--policy", choices=["heuristic", "openai"], default="heuristic")
+    parser.add_argument("--model", default=os.getenv("OPENAI_MODEL", "gpt-4o-mini"))
+    parser.add_argument("--seed", type=int, default=42)
+    return parser.parse_args()
+def main() -> None:
+    args = parse_args()
+    random.seed(args.seed)
+    openai_client: OpenAI | None = None
+    if args.policy == "openai":
+        openai_client = build_openai_client()
+    results: list[EpisodeResult] = []
+    for task_id in TASKS:
+        results.append(
+            run_episode(
+                api_base_url=args.api_base_url,
+                task_id=task_id,
+                max_steps=args.max_steps,
+                policy=args.policy,
+                model=args.model,
+                openai_client=openai_client,
+            )
+        )
+    avg = sum(r.final_score for r in results) / len(results)
+    print("\n=== BASELINE SUMMARY ===")
+    for result in results:
+        print(
+            f"{result.task_id}: final_score={result.final_score:.3f} steps={result.steps} done={str(result.done).lower()}"
+        )
+    print(f"average_score={avg:.3f}")
+if __name__ == "__main__":
+    main()

openenv.yaml ADDED Viewed

	@@ -0,0 +1,41 @@

+openenv: "1.0"
+name: "ConfigDebuggerEnv"
+description: "Real-world configuration debugging environment for Docker Compose, Kubernetes, and ML training configs"
+version: "1.0.0"
+author: "Basavesh"
+license: "MIT"
+tags:
+  - "openenv"
+  - "devops"
+  - "configuration"
+  - "debugging"
+  - "real-world"
+endpoints:
+  reset: "/reset"
+  step: "/step"
+  state: "/state"
+  tasks: "/tasks"
+spaces:
+  observation: "ConfigObservation"
+  action: "ConfigAction"
+  reward: "ConfigReward"
+  state: "EnvState"
+tasks:
+  - id: "easy_docker"
+    name: "Docker Compose Repair"
+    description: "Fix syntax and schema mistakes in docker-compose.yml"
+    difficulty: "easy"
+    max_steps: 15
+  - id: "medium_k8s"
+    name: "Kubernetes Deployment Repair"
+    description: "Fix Kubernetes type, structure, and resource spec issues"
+    difficulty: "medium"
+    max_steps: 18
+  - id: "hard_ml_config"
+    name: "ML Training Config Stabilization"
+    description: "Fix interdependent hyperparameter and hardware consistency issues"
+    difficulty: "hard"
+    max_steps: 22

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi==0.115.0
+uvicorn[standard]==0.30.6
+pydantic==2.9.2
+pyyaml==6.0.2
+openai==1.51.2
+requests==2.32.3

server/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ from .env import ConfigDebuggerEnv
2	+ from .models import ConfigAction, ConfigObservation, ConfigReward, EnvState

server/data.py ADDED Viewed

	@@ -0,0 +1,212 @@

+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Any
+@dataclass(frozen=True)
+class TaskSpec:
+    task_id: str
+    name: str
+    description: str
+    difficulty: str
+    max_steps: int
+    broken: str
+    target: dict[str, Any]
+    required_paths: dict[str, float]
+    logic_checks: list[str]
+TASK_REGISTRY: dict[str, TaskSpec] = {
+    "easy_docker": TaskSpec(
+        task_id="easy_docker",
+        name="Docker Compose Repair",
+        description=(
+            "Fix docker-compose config: invalid port entry, environment format, "
+            "image tags, and full DB port mapping"
+        ),
+        difficulty="easy",
+        max_steps=15,
+        broken="""version: \"3.8\"
+services:
+  web:
+    image: nginx
+    ports:
+      - \"80:80\"
+      - abcdef
+    environment:
+      - DEBUG=true
+      - API_KEY
+  db:
+    image: postgres:15
+    ports:
+      - \"5432\"
+volumes:
+  db_data:
+""",
+        target={
+            "version": "3.8",
+            "services": {
+                "web": {
+                    "image": "nginx:latest",
+                    "ports": ["80:80"],
+                    "environment": {
+                        "DEBUG": "true",
+                        "API_KEY": "placeholder",
+                    },
+                },
+                "db": {
+                    "image": "postgres:15",
+                    "ports": ["5432:5432"],
+                },
+            },
+            "volumes": {"db_data": None},
+        },
+        required_paths={
+            "services.web.image": 1.0,
+            "services.web.ports": 1.3,
+            "services.web.environment.DEBUG": 1.0,
+            "services.web.environment.API_KEY": 1.0,
+            "services.db.ports": 1.1,
+            "volumes.db_data": 0.6,
+        },
+        logic_checks=[
+            "web port must be host:container",
+            "db port must be full mapping",
+            "environment should be key-value map",
+        ],
+    ),
+    "medium_k8s": TaskSpec(
+        task_id="medium_k8s",
+        name="Kubernetes Deployment Repair",
+        description=(
+            "Fix deployment manifest types and required fields: replicas type, "
+            "namespace, memory units, cpu request format, and containerPort"
+        ),
+        difficulty="medium",
+        max_steps=18,
+        broken="""apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: web-app
+spec:
+  replicas: \"3\"
+  selector:
+    matchLabels:
+      app: web
+  template:
+    metadata:
+      labels:
+        app: web
+    spec:
+      containers:
+      - name: nginx
+        image: nginx
+        resources:
+          limits:
+            memory: 512
+            cpu: \"1\"
+          requests:
+            memory: 1Gi
+            cpu: 500m
+""",
+        target={
+            "apiVersion": "apps/v1",
+            "kind": "Deployment",
+            "metadata": {"name": "web-app", "namespace": "default"},
+            "spec": {
+                "replicas": 3,
+                "selector": {"matchLabels": {"app": "web"}},
+                "template": {
+                    "metadata": {"labels": {"app": "web"}},
+                    "spec": {
+                        "containers": [
+                            {
+                                "name": "nginx",
+                                "image": "nginx:latest",
+                                "resources": {
+                                    "limits": {"memory": "512Mi", "cpu": "1"},
+                                    "requests": {"memory": "256Mi", "cpu": "500m"},
+                                },
+                                "ports": [{"containerPort": 80}],
+                            }
+                        ]
+                    },
+                },
+            },
+        },
+        required_paths={
+            "metadata.namespace": 1.0,
+            "spec.replicas": 1.0,
+            "spec.template.spec.containers.0.image": 0.8,
+            "spec.template.spec.containers.0.resources.limits.memory": 1.1,
+            "spec.template.spec.containers.0.resources.requests.memory": 1.1,
+            "spec.template.spec.containers.0.resources.requests.cpu": 1.0,
+            "spec.template.spec.containers.0.ports.0.containerPort": 1.0,
+        },
+        logic_checks=[
+            "replicas should be integer",
+            "memory values should be strings with unit",
+            "cpu request should be millicores string",
+        ],
+    ),
+    "hard_ml_config": TaskSpec(
+        task_id="hard_ml_config",
+        name="ML Training Config Stabilization",
+        description=(
+            "Fix interdependent training and hardware constraints: warmup < max, "
+            "GPU consistency, optimizer choice, and logging frequency"
+        ),
+        difficulty="hard",
+        max_steps=22,
+        broken="""training:
+  batch_size: 32
+  gradient_accumulation_steps: 4
+  max_steps: 100
+  warmup_steps: 200
+  learning_rate: 0.001
+  mixed_precision: fp16
+  fp16: true
+  optimizer:
+    type: adam
+    weight_decay: 0.01
+hardware:
+  gpu_count: 0
+  use_cuda: true
+data:
+  train_batch_size: 64
+  eval_batch_size: 32
+logging:
+  log_interval: 1000
+""",
+        target={
+            "training": {
+                "batch_size": 16,
+                "gradient_accumulation_steps": 2,
+                "max_steps": 1000,
+                "warmup_steps": 100,
+                "learning_rate": 0.001,
+                "mixed_precision": "fp16",
+                "optimizer": {"type": "adamw", "weight_decay": 0.01},
+            },
+            "hardware": {"gpu_count": 1, "use_cuda": True},
+            "data": {"train_batch_size": 32, "eval_batch_size": 32},
+            "logging": {"log_interval": 10},
+        },
+        required_paths={
+            "training.max_steps": 1.1,
+            "training.warmup_steps": 1.3,
+            "training.optimizer.type": 1.2,
+            "hardware.gpu_count": 1.2,
+            "hardware.use_cuda": 0.8,
+            "data.train_batch_size": 1.1,
+            "logging.log_interval": 1.0,
+        },
+        logic_checks=[
+            "warmup_steps must be less than max_steps",
+            "if use_cuda is true, gpu_count must be >= 1",
+            "train_batch_size should be 2 * batch_size",
+            "log_interval should be <= 100",
+        ],
+    ),
+}

server/env.py ADDED Viewed

	@@ -0,0 +1,409 @@

+from __future__ import annotations
+import copy
+import hashlib
+from typing import Any
+import yaml
+from .data import TASK_REGISTRY, TaskSpec
+from .models import ConfigAction, ConfigObservation, ConfigReward, EnvState, TaskType
+class ConfigDebuggerEnv:
+    def __init__(self) -> None:
+        self.task_spec: TaskSpec | None = None
+        self.task_id: TaskType | None = None
+        self.current_config_text: str = ""
+        self.previous_score: float = 0.0
+        self.step_count: int = 0
+        self.done: bool = False
+        self.max_steps: int = 15
+        self.last_reward: ConfigReward | None = None
+        self._state_visit_count: dict[str, int] = {}
+    def reset(self, task_id: TaskType | str) -> ConfigObservation:
+        normalized_task_id = task_id.value if isinstance(task_id, TaskType) else str(task_id)
+        if normalized_task_id not in TASK_REGISTRY:
+            valid = ", ".join(TASK_REGISTRY.keys())
+            raise ValueError(f"Unknown task_id '{task_id}'. Valid task ids: {valid}")
+        spec = TASK_REGISTRY[normalized_task_id]
+        self.task_spec = spec
+        self.task_id = TaskType(normalized_task_id)
+        self.current_config_text = spec.broken
+        self.step_count = 0
+        self.done = False
+        self.max_steps = spec.max_steps
+        self._state_visit_count = {}
+        initial_score = self._grade(self.current_config_text)["overall"]
+        self.previous_score = initial_score
+        self.last_reward = None
+        self._track_state_visit(self.current_config_text)
+        return self._build_observation()
+    def step(self, action: ConfigAction) -> tuple[ConfigObservation, ConfigReward, bool, dict[str, Any]]:
+        if self.task_spec is None or self.task_id is None:
+            raise RuntimeError("Environment is not initialized. Call reset() first.")
+        if self.done:
+            obs = self._build_observation()
+            reward = ConfigReward(
+                value=0.0,
+                previous_score=self.previous_score,
+                current_score=self.previous_score,
+                delta=0.0,
+                penalties=["episode_already_done"],
+            )
+            self.last_reward = reward
+            return obs, reward, True, {"reason": "episode_already_done"}
+        self.step_count += 1
+        penalties: list[str] = []
+        try:
+            new_config_text, action_penalties = self._apply_action(self.current_config_text, action)
+            penalties.extend(action_penalties)
+            self.current_config_text = new_config_text
+        except Exception as exc:
+            penalties.append(f"invalid_action:{exc}")
+        grading = self._grade(self.current_config_text)
+        current_score = grading["overall"]
+        delta = round(current_score - self.previous_score, 4)
+        loop_penalty = self._track_state_visit(self.current_config_text)
+        if loop_penalty > 0:
+            penalties.append(f"loop_penalty:{loop_penalty:.2f}")
+        reward_value = self._compute_reward(current_score, delta, penalties, loop_penalty)
+        reward = ConfigReward(
+            value=reward_value,
+            previous_score=round(self.previous_score, 4),
+            current_score=round(current_score, 4),
+            delta=delta,
+            penalties=penalties,
+        )
+        self.previous_score = current_score
+        self.done = current_score >= 0.98 or self.step_count >= self.max_steps
+        self.last_reward = reward
+        info = {
+            "task_id": self.task_id.value,
+            "schema_score": grading["schema"],
+            "logic_score": grading["logic"],
+            "syntax_valid": grading["syntax_valid"],
+        }
+        return self._build_observation(grading), reward, self.done, info
+    def state(self) -> EnvState:
+        observation = self._build_observation() if self.task_spec is not None else None
+        return EnvState(
+            task_id=self.task_id,
+            done=self.done,
+            step_count=self.step_count,
+            max_steps=self.max_steps,
+            observation=observation,
+            last_reward=self.last_reward,
+        )
+    def _build_observation(self, grading: dict[str, Any] | None = None) -> ConfigObservation:
+        if self.task_spec is None or self.task_id is None:
+            raise RuntimeError("Environment is not initialized. Call reset() first.")
+        if grading is None:
+            grading = self._grade(self.current_config_text)
+        return ConfigObservation(
+            task_id=self.task_id,
+            task_description=self.task_spec.description,
+            current_config=self.current_config_text,
+            syntax_valid=grading["syntax_valid"],
+            validation_errors=grading["errors"],
+            schema_score=grading["schema"],
+            logic_score=grading["logic"],
+            overall_score=grading["overall"],
+            step_count=self.step_count,
+            max_steps=self.max_steps,
+        )
+    def _compute_reward(self, current_score: float, delta: float, penalties: list[str], loop_penalty: float) -> float:
+        reward = current_score
+        if delta > 0:
+            reward += min(0.15, delta)
+        elif delta < 0:
+            reward += delta * 0.4
+        penalty_total = loop_penalty
+        if any(p.startswith("invalid_action") for p in penalties):
+            penalty_total += 0.10
+        if any(p.startswith("destructive_delete") for p in penalties):
+            penalty_total += 0.08
+        reward -= penalty_total
+        if current_score >= 0.98:
+            reward += 0.05
+        return round(max(0.0, min(1.0, reward)), 4)
+    def _track_state_visit(self, config_text: str) -> float:
+        state_hash = hashlib.sha1(config_text.encode("utf-8")).hexdigest()
+        count = self._state_visit_count.get(state_hash, 0) + 1
+        self._state_visit_count[state_hash] = count
+        # Penalize repeated states to discourage loops.
+        if count <= 1:
+            return 0.0
+        return min(0.03 * (count - 1), 0.12)
+    def _apply_action(self, config_text: str, action: ConfigAction) -> tuple[str, list[str]]:
+        penalties: list[str] = []
+        data = yaml.safe_load(config_text)
+        if data is None:
+            data = {}
+        if not isinstance(data, dict):
+            raise ValueError("current config is not a dictionary-like YAML document")
+        root = copy.deepcopy(data)
+        tokens = self._parse_path(action.path)
+        if action.operation == "delete" and tokens and isinstance(tokens[0], str):
+            if tokens[0] in {"services", "spec", "training", "hardware"} and len(tokens) == 1:
+                penalties.append("destructive_delete:top_level_critical_key")
+        if action.operation in {"edit", "add"}:
+            self._set_path(root, tokens, action.value)
+        else:
+            deleted = self._delete_path(root, tokens)
+            if not deleted:
+                penalties.append("delete_noop")
+        dumped = yaml.safe_dump(root, sort_keys=False)
+        return dumped, penalties
+    def _parse_path(self, path: str) -> list[str | int]:
+        tokens: list[str | int] = []
+        for chunk in path.split("."):
+            chunk = chunk.strip()
+            if chunk == "":
+                raise ValueError("path contains empty token")
+            if chunk.isdigit():
+                tokens.append(int(chunk))
+            else:
+                tokens.append(chunk)
+        return tokens
+    def _set_path(self, root: dict[str, Any], tokens: list[str | int], value: Any) -> None:
+        if not tokens:
+            raise ValueError("cannot set empty path")
+        cursor: Any = root
+        for i, token in enumerate(tokens[:-1]):
+            nxt = tokens[i + 1]
+            if isinstance(token, int):
+                if not isinstance(cursor, list):
+                    raise ValueError("list index used on non-list node")
+                while token >= len(cursor):
+                    cursor.append({} if isinstance(nxt, str) else [])
+                if cursor[token] is None:
+                    cursor[token] = {} if isinstance(nxt, str) else []
+                cursor = cursor[token]
+            else:
+                if not isinstance(cursor, dict):
+                    raise ValueError("dict key used on non-dict node")
+                if token not in cursor or cursor[token] is None:
+                    cursor[token] = {} if isinstance(nxt, str) else []
+                cursor = cursor[token]
+        final = tokens[-1]
+        if isinstance(final, int):
+            if not isinstance(cursor, list):
+                raise ValueError("final list index used on non-list node")
+            while final >= len(cursor):
+                cursor.append(None)
+            cursor[final] = value
+        else:
+            if not isinstance(cursor, dict):
+                raise ValueError("final dict key used on non-dict node")
+            cursor[final] = value
+    def _delete_path(self, root: dict[str, Any], tokens: list[str | int]) -> bool:
+        if not tokens:
+            return False
+        cursor: Any = root
+        for token in tokens[:-1]:
+            if isinstance(token, int):
+                if not isinstance(cursor, list) or token >= len(cursor):
+                    return False
+                cursor = cursor[token]
+            else:
+                if not isinstance(cursor, dict) or token not in cursor:
+                    return False
+                cursor = cursor[token]
+        final = tokens[-1]
+        if isinstance(final, int):
+            if not isinstance(cursor, list) or final >= len(cursor):
+                return False
+            cursor.pop(final)
+            return True
+        if not isinstance(cursor, dict) or final not in cursor:
+            return False
+        del cursor[final]
+        return True
+    def _grade(self, config_text: str) -> dict[str, Any]:
+        assert self.task_spec is not None
+        errors: list[str] = []
+        try:
+            parsed = yaml.safe_load(config_text)
+        except Exception as exc:
+            return {
+                "syntax_valid": False,
+                "schema": 0.0,
+                "logic": 0.0,
+                "overall": 0.0,
+                "errors": [f"YAML syntax error: {exc}"],
+            }
+        if parsed is None:
+            parsed = {}
+        if not isinstance(parsed, dict):
+            return {
+                "syntax_valid": True,
+                "schema": 0.0,
+                "logic": 0.0,
+                "overall": 0.0,
+                "errors": ["Root document must be a mapping/dict"],
+            }
+        schema_score, schema_errors = self._grade_schema(parsed)
+        logic_score, logic_errors = self._grade_logic(parsed)
+        errors.extend(schema_errors)
+        errors.extend(logic_errors)
+        overall = round((0.60 * schema_score) + (0.40 * logic_score), 4)
+        return {
+            "syntax_valid": True,
+            "schema": schema_score,
+            "logic": logic_score,
+            "overall": overall,
+            "errors": errors[:20],
+        }
+    def _grade_schema(self, parsed: dict[str, Any]) -> tuple[float, list[str]]:
+        assert self.task_spec is not None
+        total_weight = 0.0
+        matched_weight = 0.0
+        errors: list[str] = []
+        for path, weight in self.task_spec.required_paths.items():
+            total_weight += weight
+            expected = self._read_path(self.task_spec.target, self._parse_path(path))
+            got, exists = self._safe_read(parsed, self._parse_path(path))
+            if not exists:
+                errors.append(f"Missing required path: {path}")
+                continue
+            if got == expected:
+                matched_weight += weight
+            else:
+                errors.append(f"Mismatch at {path}: expected={expected!r}, got={got!r}")
+        score = 0.0 if total_weight == 0 else round(matched_weight / total_weight, 4)
+        return score, errors
+    def _grade_logic(self, parsed: dict[str, Any]) -> tuple[float, list[str]]:
+        assert self.task_spec is not None
+        checks: list[tuple[str, bool]] = []
+        t = self.task_spec.task_id
+        if t == "easy_docker":
+            web_ports = self._safe_get(parsed, ["services", "web", "ports"], default=[])
+            db_ports = self._safe_get(parsed, ["services", "db", "ports"], default=[])
+            env_node = self._safe_get(parsed, ["services", "web", "environment"], default={})
+            checks.append(("web ports must be list", isinstance(web_ports, list)))
+            checks.append(("all web ports must contain ':'", all(isinstance(p, str) and ":" in p for p in web_ports)))
+            checks.append(("db port must include host and container", "5432:5432" in db_ports if isinstance(db_ports, list) else False))
+            checks.append(("environment must be dict", isinstance(env_node, dict)))
+        elif t == "medium_k8s":
+            replicas = self._safe_get(parsed, ["spec", "replicas"], default=None)
+            limits_mem = self._safe_get(
+                parsed,
+                ["spec", "template", "spec", "containers", 0, "resources", "limits", "memory"],
+                default="",
+            )
+            req_mem = self._safe_get(
+                parsed,
+                ["spec", "template", "spec", "containers", 0, "resources", "requests", "memory"],
+                default="",
+            )
+            req_cpu = self._safe_get(
+                parsed,
+                ["spec", "template", "spec", "containers", 0, "resources", "requests", "cpu"],
+                default="",
+            )
+            checks.append(("replicas should be int", isinstance(replicas, int)))
+            checks.append(("limits memory must include unit", isinstance(limits_mem, str) and limits_mem.endswith(("Mi", "Gi"))))
+            checks.append(("requests memory must include unit", isinstance(req_mem, str) and req_mem.endswith(("Mi", "Gi"))))
+            checks.append(("cpu request should be millicore string", isinstance(req_cpu, str) and req_cpu.endswith("m")))
+        elif t == "hard_ml_config":
+            warmup = self._safe_get(parsed, ["training", "warmup_steps"], default=0)
+            max_steps = self._safe_get(parsed, ["training", "max_steps"], default=0)
+            use_cuda = self._safe_get(parsed, ["hardware", "use_cuda"], default=False)
+            gpu_count = self._safe_get(parsed, ["hardware", "gpu_count"], default=0)
+            batch_size = self._safe_get(parsed, ["training", "batch_size"], default=0)
+            train_batch = self._safe_get(parsed, ["data", "train_batch_size"], default=0)
+            log_interval = self._safe_get(parsed, ["logging", "log_interval"], default=999999)
+            checks.append(("warmup_steps < max_steps", isinstance(warmup, int) and isinstance(max_steps, int) and warmup < max_steps))
+            checks.append(("gpu_count >=1 when use_cuda", (not use_cuda) or (isinstance(gpu_count, int) and gpu_count >= 1)))
+            checks.append(("train_batch_size equals 2 * batch_size", isinstance(batch_size, int) and isinstance(train_batch, int) and train_batch == 2 * batch_size))
+            checks.append(("log_interval <= 100", isinstance(log_interval, int) and log_interval <= 100))
+        total = len(checks)
+        passed = sum(1 for _, ok in checks if ok)
+        errors = [msg for msg, ok in checks if not ok]
+        score = 0.0 if total == 0 else round(passed / total, 4)
+        return score, errors
+    def _read_path(self, source: Any, tokens: list[str | int]) -> Any:
+        cursor = source
+        for token in tokens:
+            if isinstance(token, int):
+                cursor = cursor[token]
+            else:
+                cursor = cursor[token]
+        return cursor
+    def _safe_read(self, source: Any, tokens: list[str | int]) -> tuple[Any, bool]:
+        cursor = source
+        for token in tokens:
+            try:
+                if isinstance(token, int):
+                    if not isinstance(cursor, list):
+                        return None, False
+                    cursor = cursor[token]
+                else:
+                    if not isinstance(cursor, dict) or token not in cursor:
+                        return None, False
+                    cursor = cursor[token]
+            except Exception:
+                return None, False
+        return cursor, True
+    def _safe_get(self, source: Any, tokens: list[str | int], default: Any) -> Any:
+        value, exists = self._safe_read(source, tokens)
+        return value if exists else default

server/main.py ADDED Viewed

	@@ -0,0 +1,86 @@

+from __future__ import annotations
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from .data import TASK_REGISTRY
+from .env import ConfigDebuggerEnv
+from .models import ConfigAction, ResetRequest, StepResponse
+app = FastAPI(title="ConfigDebuggerEnv", version="1.0.0")
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+env = ConfigDebuggerEnv()
+@app.get("/")
+def root() -> dict[str, str]:
+    return {"status": "ok", "env": "ConfigDebuggerEnv"}
+@app.get("/health")
+def health() -> dict[str, str]:
+    return {"status": "healthy"}
+@app.get("/tasks")
+def tasks() -> dict[str, list[dict[str, str | int]]]:
+    values: list[dict[str, str | int]] = []
+    for spec in TASK_REGISTRY.values():
+        values.append(
+            {
+                "id": spec.task_id,
+                "name": spec.name,
+                "description": spec.description,
+                "difficulty": spec.difficulty,
+                "max_steps": spec.max_steps,
+            }
+        )
+    return {"tasks": values}
+@app.post("/reset")
+def reset(payload: ResetRequest) -> dict[str, object]:
+    task_id = payload.task_id or payload.task
+    if task_id is None:
+        raise HTTPException(status_code=400, detail="Provide task_id in request body")
+    try:
+        observation = env.reset(task_id)
+        return {
+            "observation": observation.model_dump(),
+            "success": True,
+        }
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+@app.post("/step", response_model=StepResponse)
+def step(action: ConfigAction) -> StepResponse:
+    try:
+        observation, reward, done, info = env.step(action)
+        return StepResponse(
+            observation=observation,
+            reward=reward,
+            done=done,
+            info=info,
+        )
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc
+@app.get("/state")
+def state() -> dict[str, object]:
+    try:
+        current_state = env.state()
+        return current_state.model_dump()
+    except Exception as exc:
+        raise HTTPException(status_code=400, detail=str(exc)) from exc

server/models.py ADDED Viewed

	@@ -0,0 +1,70 @@

+from __future__ import annotations
+from enum import Enum
+from typing import Any, Literal
+from pydantic import BaseModel, Field, field_validator
+class TaskType(str, Enum):
+    EASY = "easy_docker"
+    MEDIUM = "medium_k8s"
+    HARD = "hard_ml_config"
+class ConfigAction(BaseModel):
+    operation: Literal["edit", "add", "delete"] = Field(
+        description="Operation type"
+    )
+    path: str = Field(description="Dot path, list indexes allowed (example: a.b.0.c)")
+    value: Any | None = Field(default=None, description="Value used for edit/add")
+    @field_validator("path")
+    @classmethod
+    def _validate_path(cls, value: str) -> str:
+        cleaned = value.strip()
+        if not cleaned:
+            raise ValueError("path cannot be empty")
+        return cleaned
+class ConfigObservation(BaseModel):
+    task_id: TaskType
+    task_description: str
+    current_config: str
+    syntax_valid: bool
+    validation_errors: list[str] = Field(default_factory=list)
+    schema_score: float = Field(ge=0.0, le=1.0)
+    logic_score: float = Field(ge=0.0, le=1.0)
+    overall_score: float = Field(ge=0.0, le=1.0)
+    step_count: int = Field(ge=0)
+    max_steps: int = Field(ge=1)
+class ConfigReward(BaseModel):
+    value: float = Field(ge=0.0, le=1.0)
+    previous_score: float = Field(ge=0.0, le=1.0)
+    current_score: float = Field(ge=0.0, le=1.0)
+    delta: float
+    penalties: list[str] = Field(default_factory=list)
+class EnvState(BaseModel):
+    task_id: TaskType | None = None
+    done: bool
+    step_count: int = Field(ge=0)
+    max_steps: int = Field(ge=1)
+    observation: ConfigObservation | None = None
+    last_reward: ConfigReward | None = None
+class ResetRequest(BaseModel):
+    task_id: TaskType | None = None
+    task: TaskType | None = None
+class StepResponse(BaseModel):
+    observation: ConfigObservation
+    reward: ConfigReward
+    done: bool
+    info: dict[str, Any]