hissterical commited on
Commit
ebf4715
·
verified ·
1 Parent(s): b136c38

Upload 10 files

Browse files
Files changed (10) hide show
  1. Dockerfile +18 -0
  2. README.md +158 -10
  3. inference.py +221 -0
  4. openenv.yaml +41 -0
  5. requirements.txt +6 -0
  6. server/__init__.py +2 -0
  7. server/data.py +212 -0
  8. server/env.py +409 -0
  9. server/main.py +86 -0
  10. server/models.py +70 -0
Dockerfile ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ ENV PYTHONDONTWRITEBYTECODE=1
4
+ ENV PYTHONUNBUFFERED=1
5
+
6
+ WORKDIR /app
7
+
8
+ COPY requirements.txt ./
9
+ RUN pip install --no-cache-dir -r requirements.txt
10
+
11
+ COPY . .
12
+
13
+ RUN useradd --create-home --uid 1000 appuser
14
+ USER appuser
15
+
16
+ EXPOSE 7860
17
+
18
+ CMD ["uvicorn", "server.main:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,158 @@
1
- ---
2
- title: Openenv2
3
- emoji: 🚀
4
- colorFrom: pink
5
- colorTo: blue
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ConfigDebuggerEnv
2
+
3
+ ConfigDebuggerEnv is a real-world OpenEnv environment for iterative configuration debugging. It simulates tasks that platform engineers and ML engineers face in production: fixing Docker Compose, Kubernetes, and training configuration mistakes under step limits.
4
+
5
+ ## Why this environment
6
+
7
+ Configuration bugs are expensive and common in real systems. They are often partially valid YAML but semantically wrong (type mismatches, missing units, interdependent constraints). This environment provides dense trajectory rewards so an agent can learn corrective behaviors instead of only terminal success/failure.
8
+
9
+ ## OpenEnv API
10
+
11
+ The server exposes the standard lifecycle:
12
+
13
+ - POST /reset
14
+ - POST /step
15
+ - GET /state
16
+
17
+ ### Typed models
18
+
19
+ - Action model: ConfigAction
20
+ - Observation model: ConfigObservation
21
+ - Reward model: ConfigReward
22
+ - State model: EnvState
23
+
24
+ Models are defined in server/models.py and validated with Pydantic.
25
+
26
+ ## Action space
27
+
28
+ ConfigAction fields:
29
+
30
+ - operation: edit | add | delete
31
+ - path: dot path with optional list indexes (example: spec.template.spec.containers.0.image)
32
+ - value: JSON-serializable payload for edit/add
33
+
34
+ ## Observation space
35
+
36
+ ConfigObservation fields:
37
+
38
+ - task_id
39
+ - task_description
40
+ - current_config (YAML string)
41
+ - syntax_valid
42
+ - validation_errors
43
+ - schema_score (0.0 to 1.0)
44
+ - logic_score (0.0 to 1.0)
45
+ - overall_score (0.0 to 1.0)
46
+ - step_count
47
+ - max_steps
48
+
49
+ ## Tasks and graders
50
+
51
+ Three deterministic tasks are included:
52
+
53
+ 1. easy_docker (easy)
54
+ 2. medium_k8s (medium)
55
+ 3. hard_ml_config (hard)
56
+
57
+ Each task has:
58
+
59
+ - A broken starting configuration
60
+ - A target configuration
61
+ - Weighted required paths for schema grading
62
+ - Deterministic logic checks
63
+
64
+ Grading always returns normalized values in [0.0, 1.0].
65
+
66
+ ## Reward design
67
+
68
+ Reward has dense progression with penalties:
69
+
70
+ - Base reward is current overall score
71
+ - Positive delta bonus on improvement
72
+ - Regression penalty on negative delta
73
+ - Loop penalty for repeated states
74
+ - Penalty for invalid actions
75
+ - Penalty for destructive top-level deletes
76
+ - Small completion bonus when solved
77
+
78
+ This creates meaningful signals across the full episode, not only at termination.
79
+
80
+ ## Project structure
81
+
82
+ - openenv.yaml
83
+ - Dockerfile
84
+ - requirements.txt
85
+ - inference.py
86
+ - server/
87
+ - data.py
88
+ - env.py
89
+ - main.py
90
+ - models.py
91
+
92
+ ## Local setup
93
+
94
+ 1. Install dependencies
95
+
96
+ ```bash
97
+ pip install -r requirements.txt
98
+ ```
99
+
100
+ 2. Run server
101
+
102
+ ```bash
103
+ python -m uvicorn server.main:app --host 0.0.0.0 --port 8000 --reload
104
+ ```
105
+
106
+ 3. Quick API check
107
+
108
+ ```bash
109
+ curl -X POST "http://localhost:8000/reset" -H "Content-Type: application/json" -d "{\"task_id\":\"easy_docker\"}"
110
+ ```
111
+
112
+ ## Baseline inference
113
+
114
+ Heuristic baseline (fully reproducible):
115
+
116
+ ```bash
117
+ python inference.py --policy heuristic --api-base-url http://localhost:8000 --seed 42
118
+ ```
119
+
120
+ OpenAI baseline (uses OpenAI Python client and OPENAI_API_KEY):
121
+
122
+ ```bash
123
+ set OPENAI_API_KEY=your_key_here
124
+ python inference.py --policy openai --model gpt-4o-mini --api-base-url http://localhost:8000 --seed 42
125
+ ```
126
+
127
+ The script evaluates all three tasks and prints per-task and average scores.
128
+
129
+ ## Docker
130
+
131
+ Build:
132
+
133
+ ```bash
134
+ docker build -t configdebugger-env .
135
+ ```
136
+
137
+ Run:
138
+
139
+ ```bash
140
+ docker run -p 7860:7860 configdebugger-env
141
+ ```
142
+
143
+ ## Hugging Face Spaces notes
144
+
145
+ - Use Docker SDK
146
+ - Ensure Space port maps to 7860
147
+ - Add tag: openenv
148
+ - Include environment variables for external evaluation if needed
149
+
150
+ ## Validation checklist
151
+
152
+ - Typed Observation/Action/Reward models: yes
153
+ - reset/step/state implemented: yes
154
+ - 3 tasks with deterministic graders: yes
155
+ - Reward in range [0.0, 1.0] with partial progress: yes
156
+ - Baseline inference script with OpenAI client: yes
157
+ - Dockerfile included: yes
158
+ - OpenEnv metadata file included: yes
inference.py ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import json
5
+ import os
6
+ import random
7
+ from dataclasses import dataclass
8
+ from typing import Any
9
+
10
+ import requests
11
+ from openai import OpenAI
12
+
13
+
14
+ TASKS = ["easy_docker", "medium_k8s", "hard_ml_config"]
15
+
16
+
17
+ @dataclass
18
+ class EpisodeResult:
19
+ task_id: str
20
+ final_score: float
21
+ done: bool
22
+ steps: int
23
+ rewards: list[float]
24
+
25
+
26
+ def build_openai_client() -> OpenAI:
27
+ api_key = os.getenv("OPENAI_API_KEY")
28
+ if not api_key:
29
+ raise RuntimeError("OPENAI_API_KEY is required for OpenAI baseline mode")
30
+ return OpenAI(api_key=api_key)
31
+
32
+
33
+ def extract_json_object(text: str) -> dict[str, Any]:
34
+ text = text.strip()
35
+ if "```" in text:
36
+ blocks = text.split("```")
37
+ for block in blocks:
38
+ block = block.strip()
39
+ if block.startswith("json"):
40
+ block = block[4:].strip()
41
+ if block.startswith("{") and block.endswith("}"):
42
+ return json.loads(block)
43
+ start = text.find("{")
44
+ end = text.rfind("}")
45
+ if start != -1 and end != -1 and end > start:
46
+ return json.loads(text[start : end + 1])
47
+ raise ValueError("No JSON object found in model output")
48
+
49
+
50
+ def choose_heuristic_action(task_id: str, step: int) -> dict[str, Any]:
51
+ # Deterministic policy for reproducible baseline.
52
+ easy_plan = [
53
+ {"operation": "edit", "path": "services.web.image", "value": "nginx:latest"},
54
+ {"operation": "delete", "path": "services.web.ports.1"},
55
+ {"operation": "edit", "path": "services.web.environment", "value": {"DEBUG": "true", "API_KEY": "placeholder"}},
56
+ {"operation": "edit", "path": "services.db.ports.0", "value": "5432:5432"},
57
+ ]
58
+
59
+ medium_plan = [
60
+ {"operation": "edit", "path": "metadata.namespace", "value": "default"},
61
+ {"operation": "edit", "path": "spec.replicas", "value": 3},
62
+ {"operation": "edit", "path": "spec.template.spec.containers.0.image", "value": "nginx:latest"},
63
+ {"operation": "edit", "path": "spec.template.spec.containers.0.resources.limits.memory", "value": "512Mi"},
64
+ {"operation": "edit", "path": "spec.template.spec.containers.0.resources.requests.memory", "value": "256Mi"},
65
+ {"operation": "edit", "path": "spec.template.spec.containers.0.resources.requests.cpu", "value": "500m"},
66
+ {"operation": "add", "path": "spec.template.spec.containers.0.ports", "value": [{"containerPort": 80}]},
67
+ ]
68
+
69
+ hard_plan = [
70
+ {"operation": "delete", "path": "training.fp16"},
71
+ {"operation": "edit", "path": "training.batch_size", "value": 16},
72
+ {"operation": "edit", "path": "training.gradient_accumulation_steps", "value": 2},
73
+ {"operation": "edit", "path": "training.max_steps", "value": 1000},
74
+ {"operation": "edit", "path": "training.warmup_steps", "value": 100},
75
+ {"operation": "edit", "path": "training.optimizer.type", "value": "adamw"},
76
+ {"operation": "edit", "path": "hardware.gpu_count", "value": 1},
77
+ {"operation": "edit", "path": "data.train_batch_size", "value": 32},
78
+ {"operation": "edit", "path": "logging.log_interval", "value": 10},
79
+ ]
80
+
81
+ plans = {
82
+ "easy_docker": easy_plan,
83
+ "medium_k8s": medium_plan,
84
+ "hard_ml_config": hard_plan,
85
+ }
86
+ plan = plans[task_id]
87
+ return plan[min(step, len(plan) - 1)]
88
+
89
+
90
+ def choose_openai_action(client: OpenAI, model: str, observation: dict[str, Any]) -> dict[str, Any]:
91
+ system_prompt = (
92
+ "You are an environment-control agent for configuration debugging. "
93
+ "Return exactly one JSON object action."
94
+ )
95
+ user_prompt = (
96
+ "Task:\n"
97
+ f"{observation['task_description']}\n\n"
98
+ "Allowed schema:\n"
99
+ "{\"operation\": \"edit|add|delete\", \"path\": \"dot.path\", \"value\": any|null}\n\n"
100
+ f"Current score: {observation['overall_score']}\n"
101
+ f"Validation errors: {observation['validation_errors']}\n"
102
+ f"Current YAML:\n{observation['current_config']}\n"
103
+ )
104
+
105
+ response = client.chat.completions.create(
106
+ model=model,
107
+ messages=[
108
+ {"role": "system", "content": system_prompt},
109
+ {"role": "user", "content": user_prompt},
110
+ ],
111
+ temperature=0,
112
+ top_p=1,
113
+ seed=42,
114
+ )
115
+ content = response.choices[0].message.content or ""
116
+ return extract_json_object(content)
117
+
118
+
119
+ def run_episode(
120
+ api_base_url: str,
121
+ task_id: str,
122
+ max_steps: int,
123
+ policy: str,
124
+ model: str,
125
+ openai_client: OpenAI | None,
126
+ ) -> EpisodeResult:
127
+ reset_resp = requests.post(f"{api_base_url}/reset", json={"task_id": task_id}, timeout=30)
128
+ reset_resp.raise_for_status()
129
+ observation = reset_resp.json()["observation"]
130
+
131
+ rewards: list[float] = []
132
+ done = False
133
+
134
+ print(f"[START] task={task_id} policy={policy}")
135
+
136
+ for step in range(max_steps):
137
+ if done:
138
+ break
139
+
140
+ if policy == "heuristic":
141
+ action = choose_heuristic_action(task_id, step)
142
+ else:
143
+ assert openai_client is not None
144
+ action = choose_openai_action(openai_client, model, observation)
145
+
146
+ step_resp = requests.post(f"{api_base_url}/step", json=action, timeout=30)
147
+ if step_resp.status_code != 200:
148
+ rewards.append(0.0)
149
+ print(f"[STEP] task={task_id} step={step} action=invalid reward=0.00 done=false")
150
+ continue
151
+
152
+ payload = step_resp.json()
153
+ observation = payload["observation"]
154
+ reward = payload["reward"]
155
+ done = payload["done"]
156
+ reward_value = float(reward["value"])
157
+ rewards.append(reward_value)
158
+
159
+ print(
160
+ f"[STEP] task={task_id} step={step} action={action.get('operation')}:{action.get('path')} "
161
+ f"reward={reward_value:.3f} score={observation['overall_score']:.3f} done={str(done).lower()}"
162
+ )
163
+
164
+ result = EpisodeResult(
165
+ task_id=task_id,
166
+ final_score=float(observation["overall_score"]),
167
+ done=done,
168
+ steps=min(max_steps, len(rewards)),
169
+ rewards=rewards,
170
+ )
171
+
172
+ reward_text = ",".join(f"{v:.3f}" for v in rewards)
173
+ print(
174
+ f"[END] task={task_id} score={result.final_score:.3f} "
175
+ f"steps={result.steps} done={str(result.done).lower()} rewards={reward_text}"
176
+ )
177
+ return result
178
+
179
+
180
+ def parse_args() -> argparse.Namespace:
181
+ parser = argparse.ArgumentParser(description="Baseline inference for ConfigDebuggerEnv")
182
+ parser.add_argument("--api-base-url", default=os.getenv("API_BASE_URL", "http://localhost:8000"))
183
+ parser.add_argument("--max-steps", type=int, default=12)
184
+ parser.add_argument("--policy", choices=["heuristic", "openai"], default="heuristic")
185
+ parser.add_argument("--model", default=os.getenv("OPENAI_MODEL", "gpt-4o-mini"))
186
+ parser.add_argument("--seed", type=int, default=42)
187
+ return parser.parse_args()
188
+
189
+
190
+ def main() -> None:
191
+ args = parse_args()
192
+ random.seed(args.seed)
193
+
194
+ openai_client: OpenAI | None = None
195
+ if args.policy == "openai":
196
+ openai_client = build_openai_client()
197
+
198
+ results: list[EpisodeResult] = []
199
+ for task_id in TASKS:
200
+ results.append(
201
+ run_episode(
202
+ api_base_url=args.api_base_url,
203
+ task_id=task_id,
204
+ max_steps=args.max_steps,
205
+ policy=args.policy,
206
+ model=args.model,
207
+ openai_client=openai_client,
208
+ )
209
+ )
210
+
211
+ avg = sum(r.final_score for r in results) / len(results)
212
+ print("\n=== BASELINE SUMMARY ===")
213
+ for result in results:
214
+ print(
215
+ f"{result.task_id}: final_score={result.final_score:.3f} steps={result.steps} done={str(result.done).lower()}"
216
+ )
217
+ print(f"average_score={avg:.3f}")
218
+
219
+
220
+ if __name__ == "__main__":
221
+ main()
openenv.yaml ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ openenv: "1.0"
2
+ name: "ConfigDebuggerEnv"
3
+ description: "Real-world configuration debugging environment for Docker Compose, Kubernetes, and ML training configs"
4
+ version: "1.0.0"
5
+ author: "Basavesh"
6
+ license: "MIT"
7
+ tags:
8
+ - "openenv"
9
+ - "devops"
10
+ - "configuration"
11
+ - "debugging"
12
+ - "real-world"
13
+
14
+ endpoints:
15
+ reset: "/reset"
16
+ step: "/step"
17
+ state: "/state"
18
+ tasks: "/tasks"
19
+
20
+ spaces:
21
+ observation: "ConfigObservation"
22
+ action: "ConfigAction"
23
+ reward: "ConfigReward"
24
+ state: "EnvState"
25
+
26
+ tasks:
27
+ - id: "easy_docker"
28
+ name: "Docker Compose Repair"
29
+ description: "Fix syntax and schema mistakes in docker-compose.yml"
30
+ difficulty: "easy"
31
+ max_steps: 15
32
+ - id: "medium_k8s"
33
+ name: "Kubernetes Deployment Repair"
34
+ description: "Fix Kubernetes type, structure, and resource spec issues"
35
+ difficulty: "medium"
36
+ max_steps: 18
37
+ - id: "hard_ml_config"
38
+ name: "ML Training Config Stabilization"
39
+ description: "Fix interdependent hyperparameter and hardware consistency issues"
40
+ difficulty: "hard"
41
+ max_steps: 22
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ fastapi==0.115.0
2
+ uvicorn[standard]==0.30.6
3
+ pydantic==2.9.2
4
+ pyyaml==6.0.2
5
+ openai==1.51.2
6
+ requests==2.32.3
server/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ from .env import ConfigDebuggerEnv
2
+ from .models import ConfigAction, ConfigObservation, ConfigReward, EnvState
server/data.py ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+ from typing import Any
5
+
6
+
7
+ @dataclass(frozen=True)
8
+ class TaskSpec:
9
+ task_id: str
10
+ name: str
11
+ description: str
12
+ difficulty: str
13
+ max_steps: int
14
+ broken: str
15
+ target: dict[str, Any]
16
+ required_paths: dict[str, float]
17
+ logic_checks: list[str]
18
+
19
+
20
+ TASK_REGISTRY: dict[str, TaskSpec] = {
21
+ "easy_docker": TaskSpec(
22
+ task_id="easy_docker",
23
+ name="Docker Compose Repair",
24
+ description=(
25
+ "Fix docker-compose config: invalid port entry, environment format, "
26
+ "image tags, and full DB port mapping"
27
+ ),
28
+ difficulty="easy",
29
+ max_steps=15,
30
+ broken="""version: \"3.8\"
31
+ services:
32
+ web:
33
+ image: nginx
34
+ ports:
35
+ - \"80:80\"
36
+ - abcdef
37
+ environment:
38
+ - DEBUG=true
39
+ - API_KEY
40
+ db:
41
+ image: postgres:15
42
+ ports:
43
+ - \"5432\"
44
+ volumes:
45
+ db_data:
46
+ """,
47
+ target={
48
+ "version": "3.8",
49
+ "services": {
50
+ "web": {
51
+ "image": "nginx:latest",
52
+ "ports": ["80:80"],
53
+ "environment": {
54
+ "DEBUG": "true",
55
+ "API_KEY": "placeholder",
56
+ },
57
+ },
58
+ "db": {
59
+ "image": "postgres:15",
60
+ "ports": ["5432:5432"],
61
+ },
62
+ },
63
+ "volumes": {"db_data": None},
64
+ },
65
+ required_paths={
66
+ "services.web.image": 1.0,
67
+ "services.web.ports": 1.3,
68
+ "services.web.environment.DEBUG": 1.0,
69
+ "services.web.environment.API_KEY": 1.0,
70
+ "services.db.ports": 1.1,
71
+ "volumes.db_data": 0.6,
72
+ },
73
+ logic_checks=[
74
+ "web port must be host:container",
75
+ "db port must be full mapping",
76
+ "environment should be key-value map",
77
+ ],
78
+ ),
79
+ "medium_k8s": TaskSpec(
80
+ task_id="medium_k8s",
81
+ name="Kubernetes Deployment Repair",
82
+ description=(
83
+ "Fix deployment manifest types and required fields: replicas type, "
84
+ "namespace, memory units, cpu request format, and containerPort"
85
+ ),
86
+ difficulty="medium",
87
+ max_steps=18,
88
+ broken="""apiVersion: apps/v1
89
+ kind: Deployment
90
+ metadata:
91
+ name: web-app
92
+ spec:
93
+ replicas: \"3\"
94
+ selector:
95
+ matchLabels:
96
+ app: web
97
+ template:
98
+ metadata:
99
+ labels:
100
+ app: web
101
+ spec:
102
+ containers:
103
+ - name: nginx
104
+ image: nginx
105
+ resources:
106
+ limits:
107
+ memory: 512
108
+ cpu: \"1\"
109
+ requests:
110
+ memory: 1Gi
111
+ cpu: 500m
112
+ """,
113
+ target={
114
+ "apiVersion": "apps/v1",
115
+ "kind": "Deployment",
116
+ "metadata": {"name": "web-app", "namespace": "default"},
117
+ "spec": {
118
+ "replicas": 3,
119
+ "selector": {"matchLabels": {"app": "web"}},
120
+ "template": {
121
+ "metadata": {"labels": {"app": "web"}},
122
+ "spec": {
123
+ "containers": [
124
+ {
125
+ "name": "nginx",
126
+ "image": "nginx:latest",
127
+ "resources": {
128
+ "limits": {"memory": "512Mi", "cpu": "1"},
129
+ "requests": {"memory": "256Mi", "cpu": "500m"},
130
+ },
131
+ "ports": [{"containerPort": 80}],
132
+ }
133
+ ]
134
+ },
135
+ },
136
+ },
137
+ },
138
+ required_paths={
139
+ "metadata.namespace": 1.0,
140
+ "spec.replicas": 1.0,
141
+ "spec.template.spec.containers.0.image": 0.8,
142
+ "spec.template.spec.containers.0.resources.limits.memory": 1.1,
143
+ "spec.template.spec.containers.0.resources.requests.memory": 1.1,
144
+ "spec.template.spec.containers.0.resources.requests.cpu": 1.0,
145
+ "spec.template.spec.containers.0.ports.0.containerPort": 1.0,
146
+ },
147
+ logic_checks=[
148
+ "replicas should be integer",
149
+ "memory values should be strings with unit",
150
+ "cpu request should be millicores string",
151
+ ],
152
+ ),
153
+ "hard_ml_config": TaskSpec(
154
+ task_id="hard_ml_config",
155
+ name="ML Training Config Stabilization",
156
+ description=(
157
+ "Fix interdependent training and hardware constraints: warmup < max, "
158
+ "GPU consistency, optimizer choice, and logging frequency"
159
+ ),
160
+ difficulty="hard",
161
+ max_steps=22,
162
+ broken="""training:
163
+ batch_size: 32
164
+ gradient_accumulation_steps: 4
165
+ max_steps: 100
166
+ warmup_steps: 200
167
+ learning_rate: 0.001
168
+ mixed_precision: fp16
169
+ fp16: true
170
+ optimizer:
171
+ type: adam
172
+ weight_decay: 0.01
173
+ hardware:
174
+ gpu_count: 0
175
+ use_cuda: true
176
+ data:
177
+ train_batch_size: 64
178
+ eval_batch_size: 32
179
+ logging:
180
+ log_interval: 1000
181
+ """,
182
+ target={
183
+ "training": {
184
+ "batch_size": 16,
185
+ "gradient_accumulation_steps": 2,
186
+ "max_steps": 1000,
187
+ "warmup_steps": 100,
188
+ "learning_rate": 0.001,
189
+ "mixed_precision": "fp16",
190
+ "optimizer": {"type": "adamw", "weight_decay": 0.01},
191
+ },
192
+ "hardware": {"gpu_count": 1, "use_cuda": True},
193
+ "data": {"train_batch_size": 32, "eval_batch_size": 32},
194
+ "logging": {"log_interval": 10},
195
+ },
196
+ required_paths={
197
+ "training.max_steps": 1.1,
198
+ "training.warmup_steps": 1.3,
199
+ "training.optimizer.type": 1.2,
200
+ "hardware.gpu_count": 1.2,
201
+ "hardware.use_cuda": 0.8,
202
+ "data.train_batch_size": 1.1,
203
+ "logging.log_interval": 1.0,
204
+ },
205
+ logic_checks=[
206
+ "warmup_steps must be less than max_steps",
207
+ "if use_cuda is true, gpu_count must be >= 1",
208
+ "train_batch_size should be 2 * batch_size",
209
+ "log_interval should be <= 100",
210
+ ],
211
+ ),
212
+ }
server/env.py ADDED
@@ -0,0 +1,409 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import copy
4
+ import hashlib
5
+ from typing import Any
6
+
7
+ import yaml
8
+
9
+ from .data import TASK_REGISTRY, TaskSpec
10
+ from .models import ConfigAction, ConfigObservation, ConfigReward, EnvState, TaskType
11
+
12
+
13
+ class ConfigDebuggerEnv:
14
+ def __init__(self) -> None:
15
+ self.task_spec: TaskSpec | None = None
16
+ self.task_id: TaskType | None = None
17
+ self.current_config_text: str = ""
18
+ self.previous_score: float = 0.0
19
+ self.step_count: int = 0
20
+ self.done: bool = False
21
+ self.max_steps: int = 15
22
+ self.last_reward: ConfigReward | None = None
23
+ self._state_visit_count: dict[str, int] = {}
24
+
25
+ def reset(self, task_id: TaskType | str) -> ConfigObservation:
26
+ normalized_task_id = task_id.value if isinstance(task_id, TaskType) else str(task_id)
27
+
28
+ if normalized_task_id not in TASK_REGISTRY:
29
+ valid = ", ".join(TASK_REGISTRY.keys())
30
+ raise ValueError(f"Unknown task_id '{task_id}'. Valid task ids: {valid}")
31
+
32
+ spec = TASK_REGISTRY[normalized_task_id]
33
+ self.task_spec = spec
34
+ self.task_id = TaskType(normalized_task_id)
35
+ self.current_config_text = spec.broken
36
+ self.step_count = 0
37
+ self.done = False
38
+ self.max_steps = spec.max_steps
39
+ self._state_visit_count = {}
40
+ initial_score = self._grade(self.current_config_text)["overall"]
41
+ self.previous_score = initial_score
42
+ self.last_reward = None
43
+
44
+ self._track_state_visit(self.current_config_text)
45
+ return self._build_observation()
46
+
47
+ def step(self, action: ConfigAction) -> tuple[ConfigObservation, ConfigReward, bool, dict[str, Any]]:
48
+ if self.task_spec is None or self.task_id is None:
49
+ raise RuntimeError("Environment is not initialized. Call reset() first.")
50
+
51
+ if self.done:
52
+ obs = self._build_observation()
53
+ reward = ConfigReward(
54
+ value=0.0,
55
+ previous_score=self.previous_score,
56
+ current_score=self.previous_score,
57
+ delta=0.0,
58
+ penalties=["episode_already_done"],
59
+ )
60
+ self.last_reward = reward
61
+ return obs, reward, True, {"reason": "episode_already_done"}
62
+
63
+ self.step_count += 1
64
+ penalties: list[str] = []
65
+
66
+ try:
67
+ new_config_text, action_penalties = self._apply_action(self.current_config_text, action)
68
+ penalties.extend(action_penalties)
69
+ self.current_config_text = new_config_text
70
+ except Exception as exc:
71
+ penalties.append(f"invalid_action:{exc}")
72
+
73
+ grading = self._grade(self.current_config_text)
74
+ current_score = grading["overall"]
75
+ delta = round(current_score - self.previous_score, 4)
76
+
77
+ loop_penalty = self._track_state_visit(self.current_config_text)
78
+ if loop_penalty > 0:
79
+ penalties.append(f"loop_penalty:{loop_penalty:.2f}")
80
+
81
+ reward_value = self._compute_reward(current_score, delta, penalties, loop_penalty)
82
+
83
+ reward = ConfigReward(
84
+ value=reward_value,
85
+ previous_score=round(self.previous_score, 4),
86
+ current_score=round(current_score, 4),
87
+ delta=delta,
88
+ penalties=penalties,
89
+ )
90
+
91
+ self.previous_score = current_score
92
+ self.done = current_score >= 0.98 or self.step_count >= self.max_steps
93
+ self.last_reward = reward
94
+
95
+ info = {
96
+ "task_id": self.task_id.value,
97
+ "schema_score": grading["schema"],
98
+ "logic_score": grading["logic"],
99
+ "syntax_valid": grading["syntax_valid"],
100
+ }
101
+
102
+ return self._build_observation(grading), reward, self.done, info
103
+
104
+ def state(self) -> EnvState:
105
+ observation = self._build_observation() if self.task_spec is not None else None
106
+ return EnvState(
107
+ task_id=self.task_id,
108
+ done=self.done,
109
+ step_count=self.step_count,
110
+ max_steps=self.max_steps,
111
+ observation=observation,
112
+ last_reward=self.last_reward,
113
+ )
114
+
115
+ def _build_observation(self, grading: dict[str, Any] | None = None) -> ConfigObservation:
116
+ if self.task_spec is None or self.task_id is None:
117
+ raise RuntimeError("Environment is not initialized. Call reset() first.")
118
+
119
+ if grading is None:
120
+ grading = self._grade(self.current_config_text)
121
+
122
+ return ConfigObservation(
123
+ task_id=self.task_id,
124
+ task_description=self.task_spec.description,
125
+ current_config=self.current_config_text,
126
+ syntax_valid=grading["syntax_valid"],
127
+ validation_errors=grading["errors"],
128
+ schema_score=grading["schema"],
129
+ logic_score=grading["logic"],
130
+ overall_score=grading["overall"],
131
+ step_count=self.step_count,
132
+ max_steps=self.max_steps,
133
+ )
134
+
135
+ def _compute_reward(self, current_score: float, delta: float, penalties: list[str], loop_penalty: float) -> float:
136
+ reward = current_score
137
+ if delta > 0:
138
+ reward += min(0.15, delta)
139
+ elif delta < 0:
140
+ reward += delta * 0.4
141
+
142
+ penalty_total = loop_penalty
143
+ if any(p.startswith("invalid_action") for p in penalties):
144
+ penalty_total += 0.10
145
+ if any(p.startswith("destructive_delete") for p in penalties):
146
+ penalty_total += 0.08
147
+
148
+ reward -= penalty_total
149
+ if current_score >= 0.98:
150
+ reward += 0.05
151
+
152
+ return round(max(0.0, min(1.0, reward)), 4)
153
+
154
+ def _track_state_visit(self, config_text: str) -> float:
155
+ state_hash = hashlib.sha1(config_text.encode("utf-8")).hexdigest()
156
+ count = self._state_visit_count.get(state_hash, 0) + 1
157
+ self._state_visit_count[state_hash] = count
158
+ # Penalize repeated states to discourage loops.
159
+ if count <= 1:
160
+ return 0.0
161
+ return min(0.03 * (count - 1), 0.12)
162
+
163
+ def _apply_action(self, config_text: str, action: ConfigAction) -> tuple[str, list[str]]:
164
+ penalties: list[str] = []
165
+
166
+ data = yaml.safe_load(config_text)
167
+ if data is None:
168
+ data = {}
169
+ if not isinstance(data, dict):
170
+ raise ValueError("current config is not a dictionary-like YAML document")
171
+
172
+ root = copy.deepcopy(data)
173
+ tokens = self._parse_path(action.path)
174
+
175
+ if action.operation == "delete" and tokens and isinstance(tokens[0], str):
176
+ if tokens[0] in {"services", "spec", "training", "hardware"} and len(tokens) == 1:
177
+ penalties.append("destructive_delete:top_level_critical_key")
178
+
179
+ if action.operation in {"edit", "add"}:
180
+ self._set_path(root, tokens, action.value)
181
+ else:
182
+ deleted = self._delete_path(root, tokens)
183
+ if not deleted:
184
+ penalties.append("delete_noop")
185
+
186
+ dumped = yaml.safe_dump(root, sort_keys=False)
187
+ return dumped, penalties
188
+
189
+ def _parse_path(self, path: str) -> list[str | int]:
190
+ tokens: list[str | int] = []
191
+ for chunk in path.split("."):
192
+ chunk = chunk.strip()
193
+ if chunk == "":
194
+ raise ValueError("path contains empty token")
195
+ if chunk.isdigit():
196
+ tokens.append(int(chunk))
197
+ else:
198
+ tokens.append(chunk)
199
+ return tokens
200
+
201
+ def _set_path(self, root: dict[str, Any], tokens: list[str | int], value: Any) -> None:
202
+ if not tokens:
203
+ raise ValueError("cannot set empty path")
204
+
205
+ cursor: Any = root
206
+ for i, token in enumerate(tokens[:-1]):
207
+ nxt = tokens[i + 1]
208
+ if isinstance(token, int):
209
+ if not isinstance(cursor, list):
210
+ raise ValueError("list index used on non-list node")
211
+ while token >= len(cursor):
212
+ cursor.append({} if isinstance(nxt, str) else [])
213
+ if cursor[token] is None:
214
+ cursor[token] = {} if isinstance(nxt, str) else []
215
+ cursor = cursor[token]
216
+ else:
217
+ if not isinstance(cursor, dict):
218
+ raise ValueError("dict key used on non-dict node")
219
+ if token not in cursor or cursor[token] is None:
220
+ cursor[token] = {} if isinstance(nxt, str) else []
221
+ cursor = cursor[token]
222
+
223
+ final = tokens[-1]
224
+ if isinstance(final, int):
225
+ if not isinstance(cursor, list):
226
+ raise ValueError("final list index used on non-list node")
227
+ while final >= len(cursor):
228
+ cursor.append(None)
229
+ cursor[final] = value
230
+ else:
231
+ if not isinstance(cursor, dict):
232
+ raise ValueError("final dict key used on non-dict node")
233
+ cursor[final] = value
234
+
235
+ def _delete_path(self, root: dict[str, Any], tokens: list[str | int]) -> bool:
236
+ if not tokens:
237
+ return False
238
+
239
+ cursor: Any = root
240
+ for token in tokens[:-1]:
241
+ if isinstance(token, int):
242
+ if not isinstance(cursor, list) or token >= len(cursor):
243
+ return False
244
+ cursor = cursor[token]
245
+ else:
246
+ if not isinstance(cursor, dict) or token not in cursor:
247
+ return False
248
+ cursor = cursor[token]
249
+
250
+ final = tokens[-1]
251
+ if isinstance(final, int):
252
+ if not isinstance(cursor, list) or final >= len(cursor):
253
+ return False
254
+ cursor.pop(final)
255
+ return True
256
+
257
+ if not isinstance(cursor, dict) or final not in cursor:
258
+ return False
259
+ del cursor[final]
260
+ return True
261
+
262
+ def _grade(self, config_text: str) -> dict[str, Any]:
263
+ assert self.task_spec is not None
264
+
265
+ errors: list[str] = []
266
+ try:
267
+ parsed = yaml.safe_load(config_text)
268
+ except Exception as exc:
269
+ return {
270
+ "syntax_valid": False,
271
+ "schema": 0.0,
272
+ "logic": 0.0,
273
+ "overall": 0.0,
274
+ "errors": [f"YAML syntax error: {exc}"],
275
+ }
276
+
277
+ if parsed is None:
278
+ parsed = {}
279
+
280
+ if not isinstance(parsed, dict):
281
+ return {
282
+ "syntax_valid": True,
283
+ "schema": 0.0,
284
+ "logic": 0.0,
285
+ "overall": 0.0,
286
+ "errors": ["Root document must be a mapping/dict"],
287
+ }
288
+
289
+ schema_score, schema_errors = self._grade_schema(parsed)
290
+ logic_score, logic_errors = self._grade_logic(parsed)
291
+ errors.extend(schema_errors)
292
+ errors.extend(logic_errors)
293
+
294
+ overall = round((0.60 * schema_score) + (0.40 * logic_score), 4)
295
+
296
+ return {
297
+ "syntax_valid": True,
298
+ "schema": schema_score,
299
+ "logic": logic_score,
300
+ "overall": overall,
301
+ "errors": errors[:20],
302
+ }
303
+
304
+ def _grade_schema(self, parsed: dict[str, Any]) -> tuple[float, list[str]]:
305
+ assert self.task_spec is not None
306
+
307
+ total_weight = 0.0
308
+ matched_weight = 0.0
309
+ errors: list[str] = []
310
+
311
+ for path, weight in self.task_spec.required_paths.items():
312
+ total_weight += weight
313
+ expected = self._read_path(self.task_spec.target, self._parse_path(path))
314
+ got, exists = self._safe_read(parsed, self._parse_path(path))
315
+ if not exists:
316
+ errors.append(f"Missing required path: {path}")
317
+ continue
318
+ if got == expected:
319
+ matched_weight += weight
320
+ else:
321
+ errors.append(f"Mismatch at {path}: expected={expected!r}, got={got!r}")
322
+
323
+ score = 0.0 if total_weight == 0 else round(matched_weight / total_weight, 4)
324
+ return score, errors
325
+
326
+ def _grade_logic(self, parsed: dict[str, Any]) -> tuple[float, list[str]]:
327
+ assert self.task_spec is not None
328
+
329
+ checks: list[tuple[str, bool]] = []
330
+ t = self.task_spec.task_id
331
+
332
+ if t == "easy_docker":
333
+ web_ports = self._safe_get(parsed, ["services", "web", "ports"], default=[])
334
+ db_ports = self._safe_get(parsed, ["services", "db", "ports"], default=[])
335
+ env_node = self._safe_get(parsed, ["services", "web", "environment"], default={})
336
+ checks.append(("web ports must be list", isinstance(web_ports, list)))
337
+ checks.append(("all web ports must contain ':'", all(isinstance(p, str) and ":" in p for p in web_ports)))
338
+ checks.append(("db port must include host and container", "5432:5432" in db_ports if isinstance(db_ports, list) else False))
339
+ checks.append(("environment must be dict", isinstance(env_node, dict)))
340
+
341
+ elif t == "medium_k8s":
342
+ replicas = self._safe_get(parsed, ["spec", "replicas"], default=None)
343
+ limits_mem = self._safe_get(
344
+ parsed,
345
+ ["spec", "template", "spec", "containers", 0, "resources", "limits", "memory"],
346
+ default="",
347
+ )
348
+ req_mem = self._safe_get(
349
+ parsed,
350
+ ["spec", "template", "spec", "containers", 0, "resources", "requests", "memory"],
351
+ default="",
352
+ )
353
+ req_cpu = self._safe_get(
354
+ parsed,
355
+ ["spec", "template", "spec", "containers", 0, "resources", "requests", "cpu"],
356
+ default="",
357
+ )
358
+ checks.append(("replicas should be int", isinstance(replicas, int)))
359
+ checks.append(("limits memory must include unit", isinstance(limits_mem, str) and limits_mem.endswith(("Mi", "Gi"))))
360
+ checks.append(("requests memory must include unit", isinstance(req_mem, str) and req_mem.endswith(("Mi", "Gi"))))
361
+ checks.append(("cpu request should be millicore string", isinstance(req_cpu, str) and req_cpu.endswith("m")))
362
+
363
+ elif t == "hard_ml_config":
364
+ warmup = self._safe_get(parsed, ["training", "warmup_steps"], default=0)
365
+ max_steps = self._safe_get(parsed, ["training", "max_steps"], default=0)
366
+ use_cuda = self._safe_get(parsed, ["hardware", "use_cuda"], default=False)
367
+ gpu_count = self._safe_get(parsed, ["hardware", "gpu_count"], default=0)
368
+ batch_size = self._safe_get(parsed, ["training", "batch_size"], default=0)
369
+ train_batch = self._safe_get(parsed, ["data", "train_batch_size"], default=0)
370
+ log_interval = self._safe_get(parsed, ["logging", "log_interval"], default=999999)
371
+ checks.append(("warmup_steps < max_steps", isinstance(warmup, int) and isinstance(max_steps, int) and warmup < max_steps))
372
+ checks.append(("gpu_count >=1 when use_cuda", (not use_cuda) or (isinstance(gpu_count, int) and gpu_count >= 1)))
373
+ checks.append(("train_batch_size equals 2 * batch_size", isinstance(batch_size, int) and isinstance(train_batch, int) and train_batch == 2 * batch_size))
374
+ checks.append(("log_interval <= 100", isinstance(log_interval, int) and log_interval <= 100))
375
+
376
+ total = len(checks)
377
+ passed = sum(1 for _, ok in checks if ok)
378
+ errors = [msg for msg, ok in checks if not ok]
379
+ score = 0.0 if total == 0 else round(passed / total, 4)
380
+ return score, errors
381
+
382
+ def _read_path(self, source: Any, tokens: list[str | int]) -> Any:
383
+ cursor = source
384
+ for token in tokens:
385
+ if isinstance(token, int):
386
+ cursor = cursor[token]
387
+ else:
388
+ cursor = cursor[token]
389
+ return cursor
390
+
391
+ def _safe_read(self, source: Any, tokens: list[str | int]) -> tuple[Any, bool]:
392
+ cursor = source
393
+ for token in tokens:
394
+ try:
395
+ if isinstance(token, int):
396
+ if not isinstance(cursor, list):
397
+ return None, False
398
+ cursor = cursor[token]
399
+ else:
400
+ if not isinstance(cursor, dict) or token not in cursor:
401
+ return None, False
402
+ cursor = cursor[token]
403
+ except Exception:
404
+ return None, False
405
+ return cursor, True
406
+
407
+ def _safe_get(self, source: Any, tokens: list[str | int], default: Any) -> Any:
408
+ value, exists = self._safe_read(source, tokens)
409
+ return value if exists else default
server/main.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from fastapi import FastAPI, HTTPException
4
+ from fastapi.middleware.cors import CORSMiddleware
5
+
6
+ from .data import TASK_REGISTRY
7
+ from .env import ConfigDebuggerEnv
8
+ from .models import ConfigAction, ResetRequest, StepResponse
9
+
10
+
11
+ app = FastAPI(title="ConfigDebuggerEnv", version="1.0.0")
12
+
13
+ app.add_middleware(
14
+ CORSMiddleware,
15
+ allow_origins=["*"],
16
+ allow_credentials=True,
17
+ allow_methods=["*"],
18
+ allow_headers=["*"],
19
+ )
20
+
21
+ env = ConfigDebuggerEnv()
22
+
23
+
24
+ @app.get("/")
25
+ def root() -> dict[str, str]:
26
+ return {"status": "ok", "env": "ConfigDebuggerEnv"}
27
+
28
+
29
+ @app.get("/health")
30
+ def health() -> dict[str, str]:
31
+ return {"status": "healthy"}
32
+
33
+
34
+ @app.get("/tasks")
35
+ def tasks() -> dict[str, list[dict[str, str | int]]]:
36
+ values: list[dict[str, str | int]] = []
37
+ for spec in TASK_REGISTRY.values():
38
+ values.append(
39
+ {
40
+ "id": spec.task_id,
41
+ "name": spec.name,
42
+ "description": spec.description,
43
+ "difficulty": spec.difficulty,
44
+ "max_steps": spec.max_steps,
45
+ }
46
+ )
47
+ return {"tasks": values}
48
+
49
+
50
+ @app.post("/reset")
51
+ def reset(payload: ResetRequest) -> dict[str, object]:
52
+ task_id = payload.task_id or payload.task
53
+ if task_id is None:
54
+ raise HTTPException(status_code=400, detail="Provide task_id in request body")
55
+
56
+ try:
57
+ observation = env.reset(task_id)
58
+ return {
59
+ "observation": observation.model_dump(),
60
+ "success": True,
61
+ }
62
+ except Exception as exc:
63
+ raise HTTPException(status_code=400, detail=str(exc)) from exc
64
+
65
+
66
+ @app.post("/step", response_model=StepResponse)
67
+ def step(action: ConfigAction) -> StepResponse:
68
+ try:
69
+ observation, reward, done, info = env.step(action)
70
+ return StepResponse(
71
+ observation=observation,
72
+ reward=reward,
73
+ done=done,
74
+ info=info,
75
+ )
76
+ except Exception as exc:
77
+ raise HTTPException(status_code=400, detail=str(exc)) from exc
78
+
79
+
80
+ @app.get("/state")
81
+ def state() -> dict[str, object]:
82
+ try:
83
+ current_state = env.state()
84
+ return current_state.model_dump()
85
+ except Exception as exc:
86
+ raise HTTPException(status_code=400, detail=str(exc)) from exc
server/models.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from enum import Enum
4
+ from typing import Any, Literal
5
+
6
+ from pydantic import BaseModel, Field, field_validator
7
+
8
+
9
+ class TaskType(str, Enum):
10
+ EASY = "easy_docker"
11
+ MEDIUM = "medium_k8s"
12
+ HARD = "hard_ml_config"
13
+
14
+
15
+ class ConfigAction(BaseModel):
16
+ operation: Literal["edit", "add", "delete"] = Field(
17
+ description="Operation type"
18
+ )
19
+ path: str = Field(description="Dot path, list indexes allowed (example: a.b.0.c)")
20
+ value: Any | None = Field(default=None, description="Value used for edit/add")
21
+
22
+ @field_validator("path")
23
+ @classmethod
24
+ def _validate_path(cls, value: str) -> str:
25
+ cleaned = value.strip()
26
+ if not cleaned:
27
+ raise ValueError("path cannot be empty")
28
+ return cleaned
29
+
30
+
31
+ class ConfigObservation(BaseModel):
32
+ task_id: TaskType
33
+ task_description: str
34
+ current_config: str
35
+ syntax_valid: bool
36
+ validation_errors: list[str] = Field(default_factory=list)
37
+ schema_score: float = Field(ge=0.0, le=1.0)
38
+ logic_score: float = Field(ge=0.0, le=1.0)
39
+ overall_score: float = Field(ge=0.0, le=1.0)
40
+ step_count: int = Field(ge=0)
41
+ max_steps: int = Field(ge=1)
42
+
43
+
44
+ class ConfigReward(BaseModel):
45
+ value: float = Field(ge=0.0, le=1.0)
46
+ previous_score: float = Field(ge=0.0, le=1.0)
47
+ current_score: float = Field(ge=0.0, le=1.0)
48
+ delta: float
49
+ penalties: list[str] = Field(default_factory=list)
50
+
51
+
52
+ class EnvState(BaseModel):
53
+ task_id: TaskType | None = None
54
+ done: bool
55
+ step_count: int = Field(ge=0)
56
+ max_steps: int = Field(ge=1)
57
+ observation: ConfigObservation | None = None
58
+ last_reward: ConfigReward | None = None
59
+
60
+
61
+ class ResetRequest(BaseModel):
62
+ task_id: TaskType | None = None
63
+ task: TaskType | None = None
64
+
65
+
66
+ class StepResponse(BaseModel):
67
+ observation: ConfigObservation
68
+ reward: ConfigReward
69
+ done: bool
70
+ info: dict[str, Any]