SidhaGarg commited on
Commit
39ff394
·
1 Parent(s): 1ba19c7

Prepare HF Space submission validation and compliance.

Browse files
Dockerfile ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use a lightweight, stable Python image
2
+ FROM python:3.10-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Copy project files
8
+ COPY pyproject.toml .
9
+ COPY openenv.yaml .
10
+ COPY models.py .
11
+ COPY env.py .
12
+ COPY __init__.py .
13
+ COPY client.py .
14
+ COPY server ./server
15
+
16
+ # Install dependencies (no-cache to save space)
17
+ RUN pip install --no-cache-dir .
18
+
19
+ # Expose the standard OpenEnv port
20
+ EXPOSE 8000
21
+
22
+ # Start the FastAPI/OpenEnv app directly (openenv serve is not implemented in v0.2.3)
23
+ CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
README.md CHANGED
@@ -9,4 +9,120 @@ license: mit
9
  short_description: Cloud SRE/DevOps RL environment
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  short_description: Cloud SRE/DevOps RL environment
10
  ---
11
 
12
+ # Cloud DevOps RLEnv
13
+
14
+ Cloud DevOps RLEnv is an OpenEnv-compatible environment for training and evaluating agents on realistic cloud SRE and DevOps incident-response tasks.
15
+
16
+ ## Environment Description And Motivation
17
+
18
+ Production incidents are often multi-step: triage, inspect resources, check logs, apply a safe remediation, and then verify the fix. This environment simulates that loop with deterministic scenarios and shaped rewards.
19
+
20
+ Goals:
21
+ - Benchmark planning and tool-use behavior for cloud operations agents.
22
+ - Reward correct diagnosis over blind action execution.
23
+ - Provide repeatable task outcomes for fair grading and comparison.
24
+
25
+ ## Action Space
26
+
27
+ Action model: `CloudAction`
28
+
29
+ Fields:
30
+ - `command` (required): one of `list_resources`, `describe_resource`, `view_logs`, `update_security_group`, `restart_service`, `submit_solution`.
31
+ - `resource_id` (optional): target resource identifier (required for most non-list actions).
32
+ - `parameters` (optional): structured key/value arguments used by mutating actions.
33
+
34
+ Notes:
35
+ - `update_security_group` expects `parameters.port` and usually `parameters.action`.
36
+ - `restart_service` targets a single instance by `resource_id`.
37
+
38
+ ## Observation And State Space
39
+
40
+ Observation model: `CloudObservation`
41
+
42
+ Primary observation fields:
43
+ - `output`: command result payload.
44
+ - `error`: command error, when present.
45
+ - `system_health_status`: `CRITICAL`, `DEGRADED`, or `HEALTHY`.
46
+ - `done`: terminal flag.
47
+ - `reward`: scalar step reward.
48
+ - `metadata`: includes task name, resolution status, step count, and other diagnostics.
49
+
50
+ Hidden state model: `CloudState`
51
+ - `task_difficulty`: `easy`, `medium`, or `hard`.
52
+ - `resources`: underlying resource graph and logs.
53
+ - `step_count`: total actions issued.
54
+ - `is_resolved`: whether incident root cause is remediated.
55
+
56
+ ## Task Definitions And Expected Difficulty
57
+
58
+ - `easy`:
59
+ Open port `80` on `sg-web` so web traffic can flow.
60
+ Expected difficulty: low.
61
+ - `medium`:
62
+ Inspect API logs to identify DB connectivity failure, then open port `5432` on `sg-db`.
63
+ Expected difficulty: medium (requires diagnosis before remediation).
64
+ - `hard`:
65
+ Trace load balancer timeout to `i-web2`, inspect the target, then restart the correct service.
66
+ Expected difficulty: high (multi-hop diagnosis and anti-shortcut checks).
67
+
68
+ ## Setup And Usage
69
+
70
+ From repository root:
71
+
72
+ ```bash
73
+ # Validate OpenEnv package structure and manifest
74
+ ..\\.venv\\Scripts\\openenv validate
75
+
76
+ # Run pre-submission validator (skip live inference)
77
+ bash scripts/pre_submit_validate.sh --skip-inference
78
+
79
+ # Build local submission image
80
+ docker build -t cloud-devops-env:phase1 -f Dockerfile .
81
+ ```
82
+
83
+ Optional local server run:
84
+
85
+ ```bash
86
+ uvicorn server.app:app --host 0.0.0.0 --port 8000
87
+ ```
88
+
89
+ ## Inference Contract
90
+
91
+ `inference.py` uses the OpenAI client and reads the following environment variables:
92
+ - `API_BASE_URL`
93
+ - `MODEL_NAME`
94
+ - `HF_TOKEN`
95
+
96
+ It emits strict structured logs:
97
+ - `[START] { ... }` per task
98
+ - `[STEP] { ... }` per environment action
99
+ - `[END] { ... }` per task summary
100
+
101
+ ## Baseline Scores
102
+
103
+ Representative deterministic scripted-policy targets:
104
+
105
+ | Task | Baseline Score (0-1) | Notes |
106
+ | --- | --- | --- |
107
+ | easy | 1.0 | Includes identifying and fixing security group rule |
108
+ | medium | 0.8-1.0 | Depends on whether optional diagnostic reward is collected |
109
+ | hard | 1.0 | Requires correct root-cause path before restart |
110
+
111
+ Validation expectation:
112
+ - Aggregate scores are clamped to `[0.0, 1.0]`.
113
+ - `SUCCESS_SCORE_THRESHOLD` for inference summaries is `0.8`.
114
+
115
+ ## Hugging Face Space Deployment
116
+
117
+ 1. Push this repository to your Space (Docker SDK).
118
+ 2. Ensure `README.md` front matter (above) is present.
119
+ 3. Set Space secrets/variables:
120
+ - `HF_TOKEN` (secret)
121
+ - `API_BASE_URL` (for example `https://router.huggingface.co/v1`)
122
+ - `MODEL_NAME` (chosen model slug)
123
+ 4. Wait for Space build to complete.
124
+ 5. Verify endpoints:
125
+ - `GET /health` returns `200`
126
+ - `POST /reset` returns `200`
127
+
128
+ Reference: https://huggingface.co/docs/hub/spaces-config-reference
__init__.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Cloud Devops Env Environment."""
8
+
9
+ from .client import CloudDevopsEnv
10
+ from .models import (
11
+ CloudAction,
12
+ CloudDevopsAction,
13
+ CloudDevopsObservation,
14
+ CloudObservation,
15
+ CloudState,
16
+ )
17
+
18
+ __all__ = [
19
+ "CloudAction",
20
+ "CloudObservation",
21
+ "CloudState",
22
+ "CloudDevopsAction",
23
+ "CloudDevopsObservation",
24
+ "CloudDevopsEnv",
25
+ ]
client.py ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Cloud Devops Env Environment Client."""
8
+
9
+ from typing import Any, Dict
10
+
11
+ from openenv.core import EnvClient
12
+ from openenv.core.client_types import StepResult
13
+ from openenv.core.env_server.types import State
14
+
15
+ from .models import CloudAction, CloudObservation
16
+
17
+
18
+ class CloudDevopsEnv(
19
+ EnvClient[CloudAction, CloudObservation, State]
20
+ ):
21
+ """
22
+ Client for the Cloud Devops Env Environment.
23
+
24
+ This client maintains a persistent WebSocket connection to the environment server,
25
+ enabling efficient multi-step interactions with lower latency.
26
+ Each client instance has its own dedicated environment session on the server.
27
+
28
+ Example:
29
+ >>> # Connect to a running server
30
+ >>> with CloudDevopsEnv(base_url="http://localhost:8000") as client:
31
+ ... result = client.reset()
32
+ ... print(result.observation.system_health_status)
33
+ ...
34
+ ... result = client.step(CloudAction(command="list_resources"))
35
+ ... print(result.observation.output)
36
+
37
+ Example with Docker:
38
+ >>> # Automatically start container and connect
39
+ >>> client = CloudDevopsEnv.from_docker_image("cloud_devops_env-env:latest")
40
+ >>> try:
41
+ ... result = client.reset()
42
+ ... result = client.step(CloudAction(command="list_resources"))
43
+ ... finally:
44
+ ... client.close()
45
+ """
46
+
47
+ def _step_payload(self, action: CloudAction) -> Dict[str, Any]:
48
+ """
49
+ Convert CloudAction to JSON payload for step message.
50
+
51
+ Args:
52
+ action: CloudAction instance
53
+
54
+ Returns:
55
+ Dictionary representation suitable for JSON encoding
56
+ """
57
+ payload: Dict[str, Any] = {
58
+ "command": action.command,
59
+ "resource_id": action.resource_id,
60
+ "parameters": action.parameters,
61
+ }
62
+ if action.message is not None:
63
+ payload["message"] = action.message
64
+ return payload
65
+
66
+ def _parse_result(self, payload: Dict[str, Any]) -> StepResult[CloudObservation]:
67
+ """
68
+ Parse server response into StepResult[CloudObservation].
69
+
70
+ Args:
71
+ payload: JSON response data from server
72
+
73
+ Returns:
74
+ StepResult with CloudObservation
75
+ """
76
+ obs_data = payload.get("observation", {})
77
+ observation = CloudObservation(
78
+ output=obs_data.get("output", ""),
79
+ error=obs_data.get("error"),
80
+ system_health_status=obs_data.get("system_health_status", "CRITICAL"),
81
+ message_length=obs_data.get("message_length", 0),
82
+ echoed_message=obs_data.get("echoed_message"),
83
+ done=payload.get("done", False),
84
+ reward=payload.get("reward"),
85
+ metadata=obs_data.get("metadata", {}),
86
+ )
87
+
88
+ return StepResult(
89
+ observation=observation,
90
+ reward=payload.get("reward"),
91
+ done=payload.get("done", False),
92
+ )
93
+
94
+ def _parse_state(self, payload: Dict[str, Any]) -> State:
95
+ """
96
+ Parse server response into State object.
97
+
98
+ Args:
99
+ payload: JSON response from state request
100
+
101
+ Returns:
102
+ State object with episode_id and step_count
103
+ """
104
+ return State(
105
+ episode_id=payload.get("episode_id"),
106
+ step_count=payload.get("step_count", 0),
107
+ )
env.py ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Async entrypoint wrapper for external evaluators and custom graders."""
8
+
9
+ from __future__ import annotations
10
+
11
+ from typing import Any, Dict
12
+
13
+ from pydantic import BaseModel
14
+
15
+ try:
16
+ from .models import CloudAction, CloudObservation, CloudState
17
+ from .server.cloud_devops_env_environment import CloudDevopsEnvironment
18
+ except ImportError:
19
+ from models import CloudAction, CloudObservation, CloudState
20
+ from server.cloud_devops_env_environment import CloudDevopsEnvironment
21
+
22
+
23
+ class EnvResult(BaseModel):
24
+ """Canonical environment result payload for async evaluator loops."""
25
+
26
+ observation: CloudObservation
27
+ reward: float
28
+ done: bool
29
+ info: Dict[str, Any]
30
+
31
+
32
+ class CloudDevOpsEnv:
33
+ """Async-compatible facade over the OpenEnv server-side environment logic."""
34
+
35
+ def __init__(self, task_name: str = "easy"):
36
+ self._impl = CloudDevopsEnvironment(task_name=task_name)
37
+
38
+ @property
39
+ def achievements(self) -> set[str]:
40
+ """Expose completed shaped-reward checkpoints for debugging/evaluation."""
41
+ return set(self._impl._achievements)
42
+
43
+ async def reset(self) -> EnvResult:
44
+ """Reset the environment to the initial task state."""
45
+ obs = self._impl.reset()
46
+ return EnvResult(
47
+ observation=obs,
48
+ reward=float(obs.reward or 0.0),
49
+ done=bool(obs.done),
50
+ info=dict(obs.metadata or {}),
51
+ )
52
+
53
+ async def step(self, action: CloudAction) -> EnvResult:
54
+ """Execute an action and return a structured async result."""
55
+ obs = self._impl.step(action)
56
+ return EnvResult(
57
+ observation=obs,
58
+ reward=float(obs.reward or 0.0),
59
+ done=bool(obs.done),
60
+ info=dict(obs.metadata or {}),
61
+ )
62
+
63
+ async def state(self) -> CloudState:
64
+ """Return hidden environment state for deterministic evaluators."""
65
+ state = self._impl.state
66
+ assert isinstance(state, CloudState)
67
+ return state
inference.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import asyncio
2
+ import json
3
+ import os
4
+ from typing import Any, Dict, List, Tuple
5
+
6
+ from openai import OpenAI
7
+ from pydantic import ValidationError
8
+
9
+ from env import CloudDevOpsEnv
10
+ from models import CloudAction
11
+
12
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
13
+ MODEL_NAME = os.getenv("MODEL_NAME", "google/gemma-4-31B-it")
14
+ HF_TOKEN = os.getenv("HF_TOKEN")
15
+
16
+ BENCHMARK = "CloudDevOpsEnv"
17
+ MAX_STEPS = 15
18
+ MAX_TOTAL_REWARD = 1.0
19
+ SUCCESS_SCORE_THRESHOLD = 0.8
20
+
21
+
22
+ def log_start(task: str, env: str, model: str) -> None:
23
+ log_data = {"task": task, "env": env, "model": model}
24
+ print(f"[START] {json.dumps(log_data)}", flush=True)
25
+
26
+
27
+ def log_step(step: int, action: Any, reward: float, done: bool, error: Any) -> None:
28
+ action_dict = action.model_dump() if hasattr(action, "model_dump") else str(action)
29
+ log_data = {
30
+ "step": step,
31
+ "action": action_dict,
32
+ "reward": reward,
33
+ "done": done,
34
+ "error": error,
35
+ }
36
+ print(f"[STEP] {json.dumps(log_data)}", flush=True)
37
+
38
+
39
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
40
+ log_data = {"success": success, "steps": steps, "score": score, "rewards": rewards}
41
+ print(f"[END] {json.dumps(log_data)}", flush=True)
42
+
43
+
44
+ def get_model_action(
45
+ client: OpenAI,
46
+ step: int,
47
+ last_obs: str,
48
+ last_error: str,
49
+ history: List[Dict[str, str]],
50
+ ) -> Tuple[CloudAction, str]:
51
+ """Prompt the LLM and parse its response into a CloudAction."""
52
+ system_prompt = (
53
+ "You are an expert AI DevOps Engineer diagnosing a cloud infrastructure issue. "
54
+ "You must respond ONLY with a raw JSON object matching this schema:\n"
55
+ "{\n"
56
+ ' "command": "list_resources" | "describe_resource" | "view_logs" | "update_security_group" | "restart_service" | "submit_solution",\n'
57
+ ' "resource_id": "string (optional)",\n'
58
+ ' "parameters": {"key": "value"} (optional)\n'
59
+ "}\n"
60
+ "Do not include markdown blocks like ```json. Just output the JSON."
61
+ )
62
+
63
+ user_prompt = f"Step {step}.\nLast Observation:\n{last_obs}\n"
64
+ if last_error:
65
+ user_prompt += f"\nLast Error:\n{last_error}\n"
66
+ user_prompt += "\nWhat is your next action JSON?"
67
+
68
+ messages = [{"role": "system", "content": system_prompt}] + history + [
69
+ {"role": "user", "content": user_prompt}
70
+ ]
71
+
72
+ try:
73
+ response = client.chat.completions.create(
74
+ model=MODEL_NAME,
75
+ messages=messages,
76
+ temperature=0.1,
77
+ max_tokens=200,
78
+ )
79
+ raw_text = (response.choices[0].message.content or "").strip()
80
+
81
+ if raw_text.startswith("```json"):
82
+ raw_text = raw_text.replace("```json", "").replace("```", "").strip()
83
+
84
+ action_dict = json.loads(raw_text)
85
+ return CloudAction(**action_dict), raw_text
86
+ except (json.JSONDecodeError, ValidationError) as exc:
87
+ print(f"[DEBUG] Model parse failed: {exc}", flush=True)
88
+ return CloudAction(command="list_resources"), "failed_parse"
89
+ except Exception as exc:
90
+ print(f"[DEBUG] API request failed: {exc}", flush=True)
91
+ return CloudAction(command="list_resources"), "api_error"
92
+
93
+
94
+ async def run_task(task_name: str, client: OpenAI) -> None:
95
+ env = CloudDevOpsEnv(task_name=task_name)
96
+
97
+ history: List[Dict[str, str]] = []
98
+ rewards: List[float] = []
99
+ steps_taken = 0
100
+ score = 0.0
101
+ success = False
102
+
103
+ log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
104
+
105
+ try:
106
+ result = await env.reset()
107
+ last_obs = result.observation.output
108
+ last_error = result.observation.error or ""
109
+
110
+ for step in range(1, MAX_STEPS + 1):
111
+ if result.done:
112
+ break
113
+
114
+ action, raw_response = get_model_action(
115
+ client, step, last_obs, last_error, history
116
+ )
117
+
118
+ result = await env.step(action)
119
+ obs = result.observation
120
+ reward = result.reward or 0.0
121
+ done = result.done
122
+ error = obs.error
123
+
124
+ rewards.append(reward)
125
+ steps_taken = step
126
+ last_obs = obs.output
127
+ last_error = error or ""
128
+
129
+ log_step(step=step, action=action, reward=reward, done=done, error=error)
130
+
131
+ history.append({"role": "assistant", "content": raw_response})
132
+ history.append(
133
+ {
134
+ "role": "user",
135
+ "content": f"Observation: {last_obs}\nError: {last_error}",
136
+ }
137
+ )
138
+
139
+ if done:
140
+ break
141
+
142
+ score = sum(rewards)
143
+ score = min(max(score, 0.0), MAX_TOTAL_REWARD)
144
+ success = score >= SUCCESS_SCORE_THRESHOLD
145
+
146
+ finally:
147
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
148
+
149
+
150
+ async def main() -> None:
151
+ if not HF_TOKEN:
152
+ print("[WARNING] HF_TOKEN environment variable not set. API calls will likely fail.")
153
+
154
+ client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
155
+
156
+ tasks = ["easy", "medium", "hard"]
157
+ for task in tasks:
158
+ print(f"\n--- Running Task: {task.upper()} ---")
159
+ await run_task(task, client)
160
+
161
+
162
+ if __name__ == "__main__":
163
+ asyncio.run(main())
models.py ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Data models for the Cloud Devops Env Environment.
9
+
10
+ The cloud_devops_env environment simulates cloud/devops incident response tasks.
11
+ """
12
+
13
+ import json
14
+ from typing import Any, Dict, Literal, Optional
15
+
16
+ from openenv.core.env_server.types import Action, Observation, State
17
+ from pydantic import Field, field_validator
18
+
19
+
20
+ class CloudAction(Action):
21
+ """Action space (what the agent can do)."""
22
+
23
+ command: Literal[
24
+ "list_resources",
25
+ "describe_resource",
26
+ "view_logs",
27
+ "update_security_group",
28
+ "restart_service",
29
+ "submit_solution",
30
+ ] = Field(..., description="The cloud API command to execute.")
31
+ resource_id: Optional[str] = Field(
32
+ default=None,
33
+ description=(
34
+ "The ID of the target resource (e.g., 'i-12345'). "
35
+ "Required for all commands except list_resources."
36
+ ),
37
+ )
38
+ parameters: Optional[Dict[str, Any]] = Field(
39
+ default=None,
40
+ description=(
41
+ "Key-value pairs for updates "
42
+ "(e.g., {'port': '80', 'action': 'allow'} for update_security_group)."
43
+ ),
44
+ )
45
+ message: Optional[str] = Field(
46
+ default=None,
47
+ description="Legacy field from template env; safe to remove after server/client migration.",
48
+ )
49
+
50
+ @field_validator("parameters", mode="before")
51
+ @classmethod
52
+ def _coerce_parameters(cls, value: Any) -> Any:
53
+ """Allow /web text input to pass JSON for dict parameters."""
54
+ if value is None or value == "":
55
+ return None
56
+ if isinstance(value, dict):
57
+ return value
58
+ if isinstance(value, str):
59
+ try:
60
+ parsed = json.loads(value)
61
+ except json.JSONDecodeError as exc:
62
+ raise ValueError(
63
+ "parameters must be a JSON object string, e.g. {\"port\":80,\"action\":\"allow\"}"
64
+ ) from exc
65
+ if not isinstance(parsed, dict):
66
+ raise ValueError("parameters JSON must decode to an object/dictionary")
67
+ return parsed
68
+ raise ValueError("parameters must be a dictionary or JSON object string")
69
+
70
+
71
+ class CloudObservation(Observation):
72
+ """Observation space (what the agent sees)."""
73
+
74
+ output: str = Field(
75
+ ...,
76
+ description="The terminal/API response from the last command executed.",
77
+ )
78
+ error: Optional[str] = Field(
79
+ default=None,
80
+ description="Error message if the last command failed or was invalid.",
81
+ )
82
+ system_health_status: str = Field(
83
+ ...,
84
+ description="Current status of the system (e.g., 'CRITICAL', 'DEGRADED', 'HEALTHY').",
85
+ )
86
+ echoed_message: Optional[str] = Field(
87
+ default=None,
88
+ description="Legacy field from template env; safe to remove after server/client migration.",
89
+ )
90
+ message_length: int = Field(
91
+ default=0,
92
+ description="Legacy field from template env; safe to remove after server/client migration.",
93
+ )
94
+
95
+
96
+ class CloudState(State):
97
+ """State space (the hidden environment state)."""
98
+
99
+ task_difficulty: str = Field(..., description="Current task: easy, medium, or hard.")
100
+ resources: Dict[str, Dict[str, Any]] = Field(
101
+ ...,
102
+ description="The hidden JSON state of all mock cloud resources.",
103
+ )
104
+ step_count: int = Field(..., description="Number of actions taken so far.")
105
+ is_resolved: bool = Field(
106
+ ...,
107
+ description="Whether the root cause has been successfully fixed.",
108
+ )
109
+
110
+
111
+ # Backward-compatible aliases for scaffolded files that still use template names.
112
+ CloudDevopsAction = CloudAction
113
+ CloudDevopsObservation = CloudObservation
openenv.yaml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: cloud_devops_env
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
7
+
8
+ metadata:
9
+ project: cloud-devops-env
10
+ description: A real-world Cloud SRE/DevOps simulation environment.
11
+ entrypoint:
12
+ file: env.py
13
+ class: CloudDevOpsEnv
14
+ models:
15
+ file: models.py
16
+ action: CloudAction
17
+ observation: CloudObservation
18
+ state: CloudState
19
+
pyproject.toml ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ [build-system]
8
+ requires = ["setuptools>=45", "wheel"]
9
+ build-backend = "setuptools.build_meta"
10
+
11
+ [project]
12
+ name = "openenv-cloud_devops_env"
13
+ version = "0.1.0"
14
+ description = "Cloud Devops Env environment for OpenEnv"
15
+ requires-python = ">=3.10"
16
+ dependencies = [
17
+ "openenv-core[core]>=0.2.2",
18
+ "pydantic>=2.0.0",
19
+ "openai>=1.0.0",
20
+ ]
21
+
22
+ [project.optional-dependencies]
23
+ dev = [
24
+ "pytest>=8.0.0",
25
+ "pytest-cov>=4.0.0",
26
+ ]
27
+
28
+ [project.scripts]
29
+ # Server entry point - enables running via: uv run --project . server
30
+ # or: python -m cloud_devops_env.server.app
31
+ server = "cloud_devops_env.server.app:main"
32
+
33
+ [tool.setuptools]
34
+ include-package-data = true
35
+ packages = ["cloud_devops_env", "cloud_devops_env.server"]
36
+ package-dir = { "cloud_devops_env" = ".", "cloud_devops_env.server" = "server" }
scripts/pre_submit_validate.sh ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ #
3
+ # pre_submit_validate.sh
4
+ #
5
+ # Extended pre-submission checks for OpenEnv hackathon submissions.
6
+ # This script complements scripts/validate-submission.sh by also checking
7
+ # inference contract requirements and baseline reproducibility.
8
+
9
+ set -euo pipefail
10
+
11
+ DOCKER_BUILD_TIMEOUT=600
12
+ INFERENCE_TIMEOUT=1200
13
+
14
+ PING_URL=""
15
+ REPO_DIR="."
16
+ SKIP_DOCKER=false
17
+ SKIP_INFERENCE=false
18
+ PYTHON_BIN=""
19
+ OPENENV_BIN=""
20
+ OPENENV_USE_MODULE=false
21
+ DOCKER_CONTAINER_ID=""
22
+
23
+ usage() {
24
+ cat <<'EOF'
25
+ Usage: scripts/pre_submit_validate.sh [options]
26
+
27
+ Options:
28
+ --ping-url <url> HF Space URL (e.g., https://team-space.hf.space)
29
+ --repo-dir <path> Repo root directory (default: current directory)
30
+ --skip-docker Skip docker build check
31
+ --skip-inference Skip inference baseline check
32
+ -h, --help Show this help message
33
+
34
+ Required environment variables for inference checks:
35
+ API_BASE_URL
36
+ MODEL_NAME
37
+ HF_TOKEN
38
+ EOF
39
+ }
40
+
41
+ run_with_timeout() {
42
+ local secs="$1"; shift
43
+ if command -v timeout >/dev/null 2>&1; then
44
+ timeout "$secs" "$@"
45
+ elif command -v gtimeout >/dev/null 2>&1; then
46
+ gtimeout "$secs" "$@"
47
+ else
48
+ "$@" &
49
+ local pid=$!
50
+ ( sleep "$secs" && kill "$pid" 2>/dev/null ) &
51
+ local watcher=$!
52
+ wait "$pid" 2>/dev/null
53
+ local rc=$?
54
+ kill "$watcher" 2>/dev/null || true
55
+ wait "$watcher" 2>/dev/null || true
56
+ return $rc
57
+ fi
58
+ }
59
+
60
+ log() {
61
+ printf "[%s] %s\n" "$(date -u +%H:%M:%S)" "$*"
62
+ }
63
+
64
+ die() {
65
+ log "FAILED -- $*"
66
+ exit 1
67
+ }
68
+
69
+ pass() {
70
+ log "PASSED -- $*"
71
+ }
72
+
73
+ cleanup() {
74
+ if [ -n "$DOCKER_CONTAINER_ID" ]; then
75
+ docker rm -f "$DOCKER_CONTAINER_ID" >/dev/null 2>&1 || true
76
+ fi
77
+ }
78
+
79
+ trap cleanup EXIT
80
+
81
+ resolve_python_bin() {
82
+ local candidates=(
83
+ "$REPO_DIR/.venv/bin/python"
84
+ "$REPO_DIR/.venv/Scripts/python.exe"
85
+ "$REPO_DIR/../.venv/bin/python"
86
+ "$REPO_DIR/../.venv/Scripts/python.exe"
87
+ )
88
+
89
+ for c in "${candidates[@]}"; do
90
+ if [ -x "$c" ]; then
91
+ PYTHON_BIN="$c"
92
+ return 0
93
+ fi
94
+ done
95
+
96
+ if command -v python >/dev/null 2>&1; then
97
+ PYTHON_BIN="$(command -v python)"
98
+ return 0
99
+ fi
100
+ if command -v python3 >/dev/null 2>&1; then
101
+ PYTHON_BIN="$(command -v python3)"
102
+ return 0
103
+ fi
104
+
105
+ return 1
106
+ }
107
+
108
+ resolve_openenv_cmd() {
109
+ local candidates=(
110
+ "$REPO_DIR/.venv/bin/openenv"
111
+ "$REPO_DIR/.venv/Scripts/openenv.exe"
112
+ "$REPO_DIR/../.venv/bin/openenv"
113
+ "$REPO_DIR/../.venv/Scripts/openenv.exe"
114
+ )
115
+
116
+ for c in "${candidates[@]}"; do
117
+ if [ -x "$c" ]; then
118
+ OPENENV_BIN="$c"
119
+ return 0
120
+ fi
121
+ done
122
+
123
+ if command -v openenv >/dev/null 2>&1; then
124
+ OPENENV_BIN="$(command -v openenv)"
125
+ return 0
126
+ fi
127
+
128
+ return 1
129
+ }
130
+
131
+ while [ "$#" -gt 0 ]; do
132
+ case "$1" in
133
+ --ping-url)
134
+ shift
135
+ [ "$#" -gt 0 ] || die "--ping-url requires a value"
136
+ PING_URL="$1"
137
+ ;;
138
+ --repo-dir)
139
+ shift
140
+ [ "$#" -gt 0 ] || die "--repo-dir requires a value"
141
+ REPO_DIR="$1"
142
+ ;;
143
+ --skip-docker)
144
+ SKIP_DOCKER=true
145
+ ;;
146
+ --skip-inference)
147
+ SKIP_INFERENCE=true
148
+ ;;
149
+ -h|--help)
150
+ usage
151
+ exit 0
152
+ ;;
153
+ *)
154
+ die "Unknown option: $1"
155
+ ;;
156
+ esac
157
+ shift
158
+ done
159
+
160
+ REPO_DIR="$(cd "$REPO_DIR" && pwd)"
161
+ cd "$REPO_DIR"
162
+
163
+ log "Repo: $REPO_DIR"
164
+
165
+ resolve_python_bin || die "No usable Python interpreter found"
166
+ log "Python: $PYTHON_BIN"
167
+
168
+ if resolve_openenv_cmd; then
169
+ log "OpenEnv CLI: $OPENENV_BIN"
170
+ else
171
+ OPENENV_USE_MODULE=true
172
+ log "OpenEnv CLI via module: $PYTHON_BIN -m openenv"
173
+ fi
174
+
175
+ log "Step 1/8: Checking OpenEnv standard file layout"
176
+ required_files=(
177
+ "openenv.yaml"
178
+ "models.py"
179
+ "env.py"
180
+ "inference.py"
181
+ "server/app.py"
182
+ "server/cloud_devops_env_environment.py"
183
+ )
184
+ for f in "${required_files[@]}"; do
185
+ [ -f "$f" ] || die "Missing required file: $f"
186
+ done
187
+ pass "Core OpenEnv file layout looks valid"
188
+
189
+ log "Step 2/8: Checking inference contract requirements"
190
+ [ -f "inference.py" ] || die "inference.py must exist in repo root"
191
+ grep -q "from openai import OpenAI" inference.py || die "inference.py must import OpenAI client"
192
+ grep -q "OpenAI(" inference.py || die "inference.py must instantiate OpenAI client"
193
+ grep -q "\[START\]" inference.py || die "inference.py must emit [START] logs"
194
+ grep -q "\[STEP\]" inference.py || die "inference.py must emit [STEP] logs"
195
+ grep -q "\[END\]" inference.py || die "inference.py must emit [END] logs"
196
+ pass "Inference script contract checks passed"
197
+
198
+ log "Step 3/8: Validating OpenEnv manifest and typed models"
199
+ if [ "$OPENENV_USE_MODULE" = true ]; then
200
+ "$PYTHON_BIN" -m openenv validate >/tmp/openenv-validate.out 2>&1 || {
201
+ cat /tmp/openenv-validate.out
202
+ die "openenv validate failed"
203
+ }
204
+ else
205
+ "$OPENENV_BIN" validate >/tmp/openenv-validate.out 2>&1 || {
206
+ cat /tmp/openenv-validate.out
207
+ die "openenv validate failed"
208
+ }
209
+ fi
210
+ pass "openenv validate passed"
211
+
212
+ log "Step 4/8: Optional HF Space ping check"
213
+ if [ -n "$PING_URL" ]; then
214
+ PING_URL="${PING_URL%/}"
215
+ code=$(curl -s -o /tmp/pre-submit-ping.out -w "%{http_code}" -X POST \
216
+ -H "Content-Type: application/json" -d '{}' \
217
+ "$PING_URL/reset" --max-time 30 || printf "000")
218
+ [ "$code" = "200" ] || die "HF Space /reset returned HTTP $code"
219
+ pass "HF Space responds to /reset (HTTP 200)"
220
+ else
221
+ log "SKIPPED -- no --ping-url provided"
222
+ fi
223
+
224
+ log "Step 5/8: Docker build + run check"
225
+ if [ "$SKIP_DOCKER" = true ]; then
226
+ log "SKIPPED -- --skip-docker enabled"
227
+ else
228
+ command -v docker >/dev/null 2>&1 || die "docker not found"
229
+ if [ -f "Dockerfile" ]; then
230
+ context="."
231
+ elif [ -f "server/Dockerfile" ]; then
232
+ context="server"
233
+ else
234
+ die "No Dockerfile found at root or server/"
235
+ fi
236
+ run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build "$context" >/tmp/pre-submit-docker.out 2>&1 || {
237
+ tail -n 40 /tmp/pre-submit-docker.out
238
+ die "docker build failed"
239
+ }
240
+ pass "Docker build succeeded"
241
+
242
+ IMAGE_TAG="openenv-pre-submit-local"
243
+ run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build -t "$IMAGE_TAG" "$context" >/tmp/pre-submit-docker-tagged.out 2>&1 || {
244
+ tail -n 40 /tmp/pre-submit-docker-tagged.out
245
+ die "docker build (tagged) failed"
246
+ }
247
+
248
+ DOCKER_CONTAINER_ID="$(docker run -d -p 127.0.0.1::8000 "$IMAGE_TAG" 2>/tmp/pre-submit-docker-run.err || true)"
249
+ [ -n "$DOCKER_CONTAINER_ID" ] || {
250
+ cat /tmp/pre-submit-docker-run.err
251
+ die "docker run failed"
252
+ }
253
+
254
+ HOST_PORT="$(docker port "$DOCKER_CONTAINER_ID" 8000/tcp | tail -n 1 | awk -F: '{print $NF}')"
255
+ [ -n "$HOST_PORT" ] || die "could not resolve mapped host port for container"
256
+
257
+ HEALTH_OK=false
258
+ for _ in $(seq 1 30); do
259
+ health_code=$(curl -s -o /tmp/pre-submit-health.out -w "%{http_code}" \
260
+ "http://127.0.0.1:${HOST_PORT}/health" --max-time 3 || printf "000")
261
+ if [ "$health_code" = "200" ]; then
262
+ HEALTH_OK=true
263
+ break
264
+ fi
265
+ sleep 1
266
+ done
267
+ [ "$HEALTH_OK" = true ] || {
268
+ docker logs "$DOCKER_CONTAINER_ID" | tail -n 50
269
+ die "container did not become healthy on /health"
270
+ }
271
+
272
+ reset_code=$(curl -s -o /tmp/pre-submit-reset.out -w "%{http_code}" -X POST \
273
+ -H "Content-Type: application/json" -d '{}' \
274
+ "http://127.0.0.1:${HOST_PORT}/reset" --max-time 10 || printf "000")
275
+ [ "$reset_code" = "200" ] || {
276
+ docker logs "$DOCKER_CONTAINER_ID" | tail -n 50
277
+ die "container /reset returned HTTP $reset_code"
278
+ }
279
+
280
+ pass "Containerized execution check passed (/health and /reset)"
281
+
282
+ docker rm -f "$DOCKER_CONTAINER_ID" >/dev/null 2>&1 || true
283
+ DOCKER_CONTAINER_ID=""
284
+ fi
285
+
286
+ log "Step 6/8: Environment variable checks"
287
+ if [ "$SKIP_INFERENCE" = true ]; then
288
+ log "SKIPPED -- --skip-inference enabled"
289
+ else
290
+ [ -n "${API_BASE_URL:-}" ] || die "API_BASE_URL is not set"
291
+ [ -n "${MODEL_NAME:-}" ] || die "MODEL_NAME is not set"
292
+ [ -n "${HF_TOKEN:-}" ] || die "HF_TOKEN is not set"
293
+ pass "Required API_BASE_URL / MODEL_NAME / HF_TOKEN are set"
294
+ fi
295
+
296
+ log "Step 7/8: Baseline reproducibility (inference.py)"
297
+ if [ "$SKIP_INFERENCE" = true ]; then
298
+ log "SKIPPED -- --skip-inference enabled"
299
+ else
300
+ run_with_timeout "$INFERENCE_TIMEOUT" "$PYTHON_BIN" inference.py >/tmp/pre-submit-inference.out 2>&1 || {
301
+ tail -n 80 /tmp/pre-submit-inference.out
302
+ die "inference.py failed or timed out"
303
+ }
304
+ pass "inference.py completed within timeout"
305
+ fi
306
+
307
+ log "Step 8/8: Structured logs + task/grader checks"
308
+ if [ "$SKIP_INFERENCE" = true ]; then
309
+ log "SKIPPED -- --skip-inference enabled"
310
+ else
311
+ "$PYTHON_BIN" - <<'PY'
312
+ import json
313
+ import sys
314
+ from pathlib import Path
315
+
316
+ path = Path('/tmp/pre-submit-inference.out')
317
+ text = path.read_text(encoding='utf-8', errors='replace').splitlines()
318
+
319
+ starts = []
320
+ ends = []
321
+ step_count = 0
322
+
323
+ for line in text:
324
+ line = line.strip()
325
+ if line.startswith('[START] '):
326
+ payload = json.loads(line[len('[START] '):])
327
+ starts.append(payload)
328
+ elif line.startswith('[STEP] '):
329
+ json.loads(line[len('[STEP] '):])
330
+ step_count += 1
331
+ elif line.startswith('[END] '):
332
+ payload = json.loads(line[len('[END] '):])
333
+ ends.append(payload)
334
+
335
+ if len(starts) < 3:
336
+ raise SystemExit('Expected at least 3 [START] task logs')
337
+
338
+ unique_tasks = {str(s.get('task', '')) for s in starts if s.get('task')}
339
+ if len(unique_tasks) < 3:
340
+ raise SystemExit('Expected at least 3 unique tasks in [START] logs')
341
+
342
+ if len(ends) != len(starts):
343
+ raise SystemExit('Mismatch between [START] and [END] log counts')
344
+
345
+ if step_count == 0:
346
+ raise SystemExit('No [STEP] logs found')
347
+
348
+ for i, end in enumerate(ends, start=1):
349
+ score = float(end.get('score', -1.0))
350
+ rewards = end.get('rewards', [])
351
+ if not (0.0 <= score <= 1.0):
352
+ raise SystemExit(f'END #{i} score out of range [0,1]: {score}')
353
+ if not isinstance(rewards, list):
354
+ raise SystemExit(f'END #{i} rewards must be a list')
355
+ for r in rewards:
356
+ rv = float(r)
357
+ if not (-1.0 <= rv <= 1.0):
358
+ raise SystemExit(f'END #{i} step reward out of sanity range [-1,1]: {rv}')
359
+
360
+ print('Structured logs and task/grader checks passed')
361
+ PY
362
+ pass "Structured [START]/[STEP]/[END] logs and score-range checks passed"
363
+ fi
364
+
365
+ log "All checks passed. Submission is ready."
scripts/validate-submission.sh ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ #
3
+ # validate-submission.sh — OpenEnv Submission Validator
4
+ #
5
+ # Checks that your HF Space is live, Docker image builds, and openenv validate passes.
6
+ #
7
+ # Prerequisites:
8
+ # - Docker: https://docs.docker.com/get-docker/
9
+ # - openenv-core: pip install openenv-core
10
+ # - curl (usually pre-installed)
11
+ #
12
+ # Run:
13
+ # curl -fsSL https://raw.githubusercontent.com/<owner>/<repo>/main/scripts/validate-submission.sh | bash -s -- <ping_url> [repo_dir]
14
+ #
15
+ # Or download and run locally:
16
+ # chmod +x validate-submission.sh
17
+ # ./validate-submission.sh <ping_url> [repo_dir]
18
+ #
19
+ # Arguments:
20
+ # ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)
21
+ # repo_dir Path to your repo (default: current directory)
22
+ #
23
+ # Examples:
24
+ # ./validate-submission.sh https://my-team.hf.space
25
+ # ./validate-submission.sh https://my-team.hf.space ./my-repo
26
+ #
27
+
28
+ set -uo pipefail
29
+
30
+ DOCKER_BUILD_TIMEOUT=600
31
+ if [ -t 1 ]; then
32
+ RED='\033[0;31m'
33
+ GREEN='\033[0;32m'
34
+ YELLOW='\033[1;33m'
35
+ BOLD='\033[1m'
36
+ NC='\033[0m'
37
+ else
38
+ RED='' GREEN='' YELLOW='' BOLD='' NC=''
39
+ fi
40
+
41
+ run_with_timeout() {
42
+ local secs="$1"; shift
43
+ if command -v timeout &>/dev/null; then
44
+ timeout "$secs" "$@"
45
+ elif command -v gtimeout &>/dev/null; then
46
+ gtimeout "$secs" "$@"
47
+ else
48
+ "$@" &
49
+ local pid=$!
50
+ ( sleep "$secs" && kill "$pid" 2>/dev/null ) &
51
+ local watcher=$!
52
+ wait "$pid" 2>/dev/null
53
+ local rc=$?
54
+ kill "$watcher" 2>/dev/null
55
+ wait "$watcher" 2>/dev/null
56
+ return $rc
57
+ fi
58
+ }
59
+
60
+ portable_mktemp() {
61
+ local prefix="${1:-validate}"
62
+ mktemp "${TMPDIR:-/tmp}/${prefix}-XXXXXX" 2>/dev/null || mktemp
63
+ }
64
+
65
+ CLEANUP_FILES=()
66
+ cleanup() { rm -f "${CLEANUP_FILES[@]+"${CLEANUP_FILES[@]}"}"; }
67
+ trap cleanup EXIT
68
+
69
+ PING_URL="${1:-}"
70
+ REPO_DIR="${2:-.}"
71
+
72
+ if [ -z "$PING_URL" ]; then
73
+ printf "Usage: %s <ping_url> [repo_dir]\n" "$0"
74
+ printf "\n"
75
+ printf " ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)\n"
76
+ printf " repo_dir Path to your repo (default: current directory)\n"
77
+ exit 1
78
+ fi
79
+
80
+ if ! REPO_DIR="$(cd "$REPO_DIR" 2>/dev/null && pwd)"; then
81
+ printf "Error: directory '%s' not found\n" "${2:-.}"
82
+ exit 1
83
+ fi
84
+ PING_URL="${PING_URL%/}"
85
+ export PING_URL
86
+ PASS=0
87
+
88
+ log() { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
89
+ pass() { log "${GREEN}PASSED${NC} -- $1"; PASS=$((PASS + 1)); }
90
+ fail() { log "${RED}FAILED${NC} -- $1"; }
91
+ hint() { printf " ${YELLOW}Hint:${NC} %b\n" "$1"; }
92
+ stop_at() {
93
+ printf "\n"
94
+ printf "${RED}${BOLD}Validation stopped at %s.${NC} Fix the above before continuing.\n" "$1"
95
+ exit 1
96
+ }
97
+
98
+ printf "\n"
99
+ printf "${BOLD}========================================${NC}\n"
100
+ printf "${BOLD} OpenEnv Submission Validator${NC}\n"
101
+ printf "${BOLD}========================================${NC}\n"
102
+ log "Repo: $REPO_DIR"
103
+ log "Ping URL: $PING_URL"
104
+ printf "\n"
105
+
106
+ log "${BOLD}Step 1/3: Pinging HF Space${NC} ($PING_URL/reset) ..."
107
+
108
+ CURL_OUTPUT=$(portable_mktemp "validate-curl")
109
+ CLEANUP_FILES+=("$CURL_OUTPUT")
110
+ HTTP_CODE=$(curl -s -o "$CURL_OUTPUT" -w "%{http_code}" -X POST \
111
+ -H "Content-Type: application/json" -d '{}' \
112
+ "$PING_URL/reset" --max-time 30 2>"$CURL_OUTPUT" || printf "000")
113
+
114
+ if [ "$HTTP_CODE" = "200" ]; then
115
+ pass "HF Space is live and responds to /reset"
116
+ elif [ "$HTTP_CODE" = "000" ]; then
117
+ fail "HF Space not reachable (connection failed or timed out)"
118
+ hint "Check your network connection and that the Space is running."
119
+ hint "Try: curl -s -o /dev/null -w '%%{http_code}' -X POST $PING_URL/reset"
120
+ stop_at "Step 1"
121
+ else
122
+ fail "HF Space /reset returned HTTP $HTTP_CODE (expected 200)"
123
+ hint "Make sure your Space is running and the URL is correct."
124
+ hint "Try opening $PING_URL in your browser first."
125
+ stop_at "Step 1"
126
+ fi
127
+
128
+ log "${BOLD}Step 2/3: Running docker build${NC} ..."
129
+
130
+ if ! command -v docker &>/dev/null; then
131
+ fail "docker command not found"
132
+ hint "Install Docker: https://docs.docker.com/get-docker/"
133
+ stop_at "Step 2"
134
+ fi
135
+
136
+ if [ -f "$REPO_DIR/Dockerfile" ]; then
137
+ DOCKER_CONTEXT="$REPO_DIR"
138
+ elif [ -f "$REPO_DIR/server/Dockerfile" ]; then
139
+ DOCKER_CONTEXT="$REPO_DIR/server"
140
+ else
141
+ fail "No Dockerfile found in repo root or server/ directory"
142
+ stop_at "Step 2"
143
+ fi
144
+
145
+ log " Found Dockerfile in $DOCKER_CONTEXT"
146
+
147
+ BUILD_OK=false
148
+ BUILD_OUTPUT=$(run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build "$DOCKER_CONTEXT" 2>&1) && BUILD_OK=true
149
+
150
+ if [ "$BUILD_OK" = true ]; then
151
+ pass "Docker build succeeded"
152
+ else
153
+ fail "Docker build failed (timeout=${DOCKER_BUILD_TIMEOUT}s)"
154
+ printf "%s\n" "$BUILD_OUTPUT" | tail -20
155
+ stop_at "Step 2"
156
+ fi
157
+
158
+ log "${BOLD}Step 3/3: Running openenv validate${NC} ..."
159
+
160
+ if ! command -v openenv &>/dev/null; then
161
+ fail "openenv command not found"
162
+ hint "Install it: pip install openenv-core"
163
+ stop_at "Step 3"
164
+ fi
165
+
166
+ VALIDATE_OK=false
167
+ VALIDATE_OUTPUT=$(cd "$REPO_DIR" && openenv validate 2>&1) && VALIDATE_OK=true
168
+
169
+ if [ "$VALIDATE_OK" = true ]; then
170
+ pass "openenv validate passed"
171
+ [ -n "$VALIDATE_OUTPUT" ] && log " $VALIDATE_OUTPUT"
172
+ else
173
+ fail "openenv validate failed"
174
+ printf "%s\n" "$VALIDATE_OUTPUT"
175
+ stop_at "Step 3"
176
+ fi
177
+
178
+ printf "\n"
179
+ printf "${BOLD}========================================${NC}\n"
180
+ printf "${GREEN}${BOLD} All 3/3 checks passed!${NC}\n"
181
+ printf "${GREEN}${BOLD} Your submission is ready to submit.${NC}\n"
182
+ printf "${BOLD}========================================${NC}\n"
183
+ printf "\n"
184
+
185
+ exit 0
server/Dockerfile ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ # Multi-stage build using openenv-base
8
+ # This Dockerfile is flexible and works for both:
9
+ # - In-repo environments (with local OpenEnv sources)
10
+ # - Standalone environments (with openenv from PyPI/Git)
11
+ # The build script (openenv build) handles context detection and sets appropriate build args.
12
+
13
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
14
+ FROM ${BASE_IMAGE} AS builder
15
+
16
+ WORKDIR /app
17
+
18
+ # Ensure git is available (required for installing dependencies from VCS)
19
+ RUN apt-get update && \
20
+ apt-get install -y --no-install-recommends git && \
21
+ rm -rf /var/lib/apt/lists/*
22
+
23
+ # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=in-repo
25
+ ARG ENV_NAME=cloud_devops_env
26
+
27
+ # Copy environment code (always at root of build context)
28
+ COPY . /app/env
29
+
30
+ # For in-repo builds, openenv is already vendored in the build context
31
+ # For standalone builds, openenv will be installed via pyproject.toml
32
+ WORKDIR /app/env
33
+
34
+ # Ensure uv is available (for local builds where base image lacks it)
35
+ RUN if ! command -v uv >/dev/null 2>&1; then \
36
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
37
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
38
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
39
+ fi
40
+
41
+ # Install dependencies using uv sync
42
+ # If uv.lock exists, use it; otherwise resolve on the fly
43
+ RUN --mount=type=cache,target=/root/.cache/uv \
44
+ if [ -f uv.lock ]; then \
45
+ uv sync --frozen --no-install-project --no-editable; \
46
+ else \
47
+ uv sync --no-install-project --no-editable; \
48
+ fi
49
+
50
+ RUN --mount=type=cache,target=/root/.cache/uv \
51
+ if [ -f uv.lock ]; then \
52
+ uv sync --frozen --no-editable; \
53
+ else \
54
+ uv sync --no-editable; \
55
+ fi
56
+
57
+ # Final runtime stage
58
+ FROM ${BASE_IMAGE}
59
+
60
+ WORKDIR /app
61
+
62
+ # Copy the virtual environment from builder
63
+ COPY --from=builder /app/env/.venv /app/.venv
64
+
65
+ # Copy the environment code
66
+ COPY --from=builder /app/env /app/env
67
+
68
+ # Set PATH to use the virtual environment
69
+ ENV PATH="/app/.venv/bin:$PATH"
70
+
71
+ # Set PYTHONPATH so imports work correctly
72
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+
74
+ # Health check
75
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
+ CMD curl -f http://localhost:8000/health || exit 1
77
+
78
+ # Run the FastAPI server
79
+ # The module path is constructed to work with the /app/env structure
80
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
server/__init__.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """Cloud Devops Env environment server components."""
8
+
9
+ from .cloud_devops_env_environment import CloudDevopsEnvironment
10
+
11
+ __all__ = ["CloudDevopsEnvironment"]
server/app.py ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ FastAPI application for the Cloud Devops Env Environment.
9
+
10
+ This module creates an HTTP server that exposes the CloudDevopsEnvironment
11
+ over HTTP and WebSocket endpoints, compatible with EnvClient.
12
+
13
+ Endpoints:
14
+ - POST /reset: Reset the environment
15
+ - POST /step: Execute an action
16
+ - GET /state: Get current environment state
17
+ - GET /schema: Get action/observation schemas
18
+ - WS /ws: WebSocket endpoint for persistent sessions
19
+
20
+ Usage:
21
+ # Development (with auto-reload):
22
+ uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
23
+
24
+ # Production:
25
+ uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
26
+
27
+ # Or run directly:
28
+ python -m server.app
29
+ """
30
+
31
+ import os
32
+ from pathlib import Path
33
+
34
+ # Default to enabling the OpenEnv web interface for local development.
35
+ # You can still disable it explicitly: ENABLE_WEB_INTERFACE=false
36
+ os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
37
+ os.environ.setdefault(
38
+ "ENV_README_PATH",
39
+ str((Path(__file__).resolve().parent.parent / "README.md")),
40
+ )
41
+
42
+ try:
43
+ from openenv.core.env_server.http_server import create_app
44
+ except Exception as e: # pragma: no cover
45
+ raise ImportError(
46
+ "openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
47
+ ) from e
48
+
49
+ try:
50
+ from ..models import CloudDevopsAction, CloudDevopsObservation
51
+ from .cloud_devops_env_environment import CloudDevopsEnvironment
52
+ except (ModuleNotFoundError, ImportError):
53
+ from models import CloudDevopsAction, CloudDevopsObservation
54
+ from server.cloud_devops_env_environment import CloudDevopsEnvironment
55
+
56
+
57
+ # Create the app with web interface and README integration
58
+ app = create_app(
59
+ CloudDevopsEnvironment,
60
+ CloudDevopsAction,
61
+ CloudDevopsObservation,
62
+ env_name="cloud_devops_env",
63
+ max_concurrent_envs=1, # increase this number to allow more concurrent WebSocket sessions
64
+ )
65
+
66
+
67
+ def main(host: str | None = None, port: int | None = None):
68
+ """
69
+ Entry point for direct execution via uv run or python -m.
70
+
71
+ This function enables running the server without Docker:
72
+ uv run --project . server
73
+ uv run --project . server --port 8001
74
+ python -m cloud_devops_env.server.app
75
+
76
+ Args:
77
+ host: Host address to bind to. If not provided, CLI args are parsed.
78
+ port: Port number to listen on. If not provided, CLI args are parsed.
79
+
80
+ For production deployments, consider using uvicorn directly with
81
+ multiple workers:
82
+ uvicorn cloud_devops_env.server.app:app --workers 4
83
+ """
84
+ import argparse
85
+ import uvicorn
86
+
87
+ # Console-script entry points invoke main() with no parameters, so parse
88
+ # CLI flags here to make `server --host ... --port ...` work as expected.
89
+ if host is None and port is None:
90
+ parser = argparse.ArgumentParser(add_help=False)
91
+ parser.add_argument("--host", type=str, default="0.0.0.0")
92
+ parser.add_argument("--port", type=int, default=8000)
93
+ args, _ = parser.parse_known_args()
94
+ host = args.host
95
+ port = args.port
96
+
97
+ uvicorn.run(app, host=host or "0.0.0.0", port=port or 8000)
98
+
99
+
100
+ if __name__ == "__main__":
101
+ main()
server/cloud_devops_env_environment.py ADDED
@@ -0,0 +1,384 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ """
8
+ Cloud Devops Env Environment Implementation.
9
+
10
+ A deterministic mock cloud/devops environment with reward shaping and
11
+ anti-farming guardrails for hackathon evaluation.
12
+ """
13
+
14
+ from __future__ import annotations
15
+
16
+ import copy
17
+ from uuid import uuid4
18
+
19
+ from openenv.core.env_server.interfaces import Environment
20
+ from openenv.core.env_server.types import State
21
+
22
+ try:
23
+ from ..models import CloudAction, CloudObservation, CloudState
24
+ except ImportError:
25
+ from models import CloudAction, CloudObservation, CloudState
26
+
27
+
28
+ class CloudDevopsEnvironment(Environment):
29
+ """
30
+ A deterministic mock cloud/devops environment.
31
+
32
+ Tasks:
33
+ - easy: open port 80 on sg-web
34
+ - medium: inspect noisy API logs, then open port 5432 on sg-db
35
+ - hard: trace 502 from lb-main to i-web2, then restart i-web2 (not i-web1)
36
+
37
+ Example:
38
+ >>> env = CloudDevopsEnvironment()
39
+ >>> obs = env.reset()
40
+ >>> print(obs.system_health_status) # "CRITICAL"
41
+ >>>
42
+ >>> obs = env.step(CloudAction(command="list_resources"))
43
+ >>> print(obs.output)
44
+ """
45
+
46
+ # Enable concurrent WebSocket sessions.
47
+ # Set to True if your environment isolates state between instances.
48
+ # When True, multiple WebSocket clients can connect simultaneously, each
49
+ # getting their own environment instance (when using factory mode in app.py).
50
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
51
+ MAX_STEPS: int = 20
52
+ VALID_TASKS = {"easy", "medium", "hard"}
53
+
54
+ def __init__(self, task_name: str = "easy"):
55
+ """Initialize the cloud_devops_env environment."""
56
+ normalized_task = (task_name or "easy").lower()
57
+ if normalized_task not in self.VALID_TASKS:
58
+ raise ValueError(f"Unknown task: {task_name}")
59
+
60
+ self.task_name = normalized_task
61
+ self._state_data: CloudState | None = None
62
+ self._achievements: set[str] = set()
63
+
64
+ def _build_noise_resources(self) -> dict[str, dict[str, object]]:
65
+ """Generate deterministic decoy resources to force retrieval and filtering."""
66
+ resources: dict[str, dict[str, object]] = {}
67
+ for i in range(1, 21):
68
+ suffix = f"{i:02d}"
69
+ resources[f"i-backend-{suffix}"] = {
70
+ "type": "Instance",
71
+ "status": "running",
72
+ "logs": (
73
+ "[2026-04-06 17:00:00] INFO node-exporter: "
74
+ "standard metrics reported successfully"
75
+ ),
76
+ }
77
+ resources[f"sg-backend-{suffix}"] = {
78
+ "type": "SecurityGroup",
79
+ "rules": [{"port": 443, "action": "allow"}],
80
+ }
81
+ return resources
82
+
83
+ def _build_task_resources(self) -> dict[str, dict[str, object]]:
84
+ resources = self._build_noise_resources()
85
+
86
+ if self.task_name == "easy":
87
+ resources.update(
88
+ {
89
+ "i-web": {"type": "Instance", "status": "running"},
90
+ "sg-web": {
91
+ "type": "SecurityGroup",
92
+ "rules": [{"port": 22, "action": "allow"}],
93
+ },
94
+ }
95
+ )
96
+ return resources
97
+
98
+ if self.task_name == "medium":
99
+ resources.update(
100
+ {
101
+ "i-api": {
102
+ "type": "Instance",
103
+ "status": "running",
104
+ "logs": (
105
+ "[2026-04-06 17:01:22] [CRITICAL] "
106
+ "sqlalchemy.exc.OperationalError: "
107
+ "(psycopg2.OperationalError) connection to server at "
108
+ "'10.0.4.5' (i-db), port 5432 failed: Connection timed out. "
109
+ "Is the server running and accepting TCP/IP connections?"
110
+ ),
111
+ },
112
+ "i-db": {"type": "Instance", "status": "running"},
113
+ "sg-db": {
114
+ "type": "SecurityGroup",
115
+ "rules": [{"port": 22, "action": "allow"}],
116
+ },
117
+ }
118
+ )
119
+ return resources
120
+
121
+ resources.update(
122
+ {
123
+ "lb-main": {
124
+ "type": "LoadBalancer",
125
+ "logs": (
126
+ "2026/04/06 17:02:09 [error] 3197#3197: *4189 upstream timed out "
127
+ "(110: Connection timed out) while reading response header from upstream, "
128
+ "client: 10.0.2.14, server: api.prod.local, request: \"GET /checkout HTTP/1.1\", "
129
+ "upstream: \"http://i-web2:8080/checkout\", host: \"api.prod.local\"\n"
130
+ "2026/04/06 17:02:10 [error] 3197#3197: *4190 no live upstreams while "
131
+ "connecting to upstream \"i-web2\""
132
+ ),
133
+ },
134
+ "i-web1": {
135
+ "type": "Instance",
136
+ "status": "running",
137
+ "logs": (
138
+ "[2026-04-06 17:02:11] INFO web-service: readiness probe passed\n"
139
+ "[2026-04-06 17:02:12] INFO jvm: heap usage stable at 42%"
140
+ ),
141
+ },
142
+ "i-web2": {
143
+ "type": "Instance",
144
+ "status": "degraded",
145
+ "logs": (
146
+ "kernel: Out of memory: Killed process 12345 (java) total-vm:4194304kB, "
147
+ "anon-rss:3145728kB\n"
148
+ "systemd[1]: web-service.service: Main process exited, code=killed, "
149
+ "status=9/KILL"
150
+ ),
151
+ },
152
+ "sg-web": {
153
+ "type": "SecurityGroup",
154
+ "rules": [{"port": 80, "action": "allow"}],
155
+ },
156
+ }
157
+ )
158
+ return resources
159
+
160
+ def _reward_once(self, achievement: str, points: float) -> float:
161
+ if achievement in self._achievements:
162
+ return 0.0
163
+ self._achievements.add(achievement)
164
+ return points
165
+
166
+ def reset(self) -> CloudObservation: # type: ignore[override]
167
+ """Reset the environment to the initial state for the selected task."""
168
+ self._achievements.clear()
169
+ self._state_data = CloudState(
170
+ episode_id=str(uuid4()),
171
+ task_difficulty=self.task_name,
172
+ resources=copy.deepcopy(self._build_task_resources()),
173
+ step_count=0,
174
+ is_resolved=False,
175
+ )
176
+
177
+ return CloudObservation(
178
+ output=(
179
+ "Environment initialized. System status is currently CRITICAL. "
180
+ "Use 'list_resources' to begin triage."
181
+ ),
182
+ error=None,
183
+ system_health_status="CRITICAL",
184
+ done=False,
185
+ reward=0.0,
186
+ metadata={
187
+ "step_count": 0,
188
+ "resolved": False,
189
+ "task": self.task_name,
190
+ "total_resources": len(self._state_data.resources),
191
+ },
192
+ echoed_message="Cloud Devops Env environment ready!",
193
+ message_length=0,
194
+ )
195
+
196
+ def step(self, action: CloudAction) -> CloudObservation: # type: ignore[override]
197
+ """Execute the agent action and return the next observation."""
198
+ if self._state_data is None:
199
+ self.reset()
200
+
201
+ assert self._state_data is not None
202
+ state = self._state_data
203
+
204
+ state.step_count += 1
205
+ reward = 0.0
206
+ done = False
207
+ output = ""
208
+ error = None
209
+
210
+ try:
211
+ if action.command == "list_resources":
212
+ res_list = [
213
+ f"{resource_id} ({data['type']})"
214
+ for resource_id, data in sorted(state.resources.items())
215
+ ]
216
+ output = "Available Resources:\n" + "\n".join(res_list)
217
+
218
+ elif action.command == "describe_resource":
219
+ if not action.resource_id or action.resource_id not in state.resources:
220
+ raise ValueError(f"Resource {action.resource_id} not found.")
221
+
222
+ output = str(state.resources[action.resource_id])
223
+
224
+ if self.task_name == "easy" and action.resource_id == "sg-web":
225
+ reward += self._reward_once("read_sg", 0.2)
226
+ elif self.task_name == "medium" and action.resource_id == "sg-db":
227
+ reward += self._reward_once("read_sg", 0.2)
228
+ elif self.task_name == "hard" and action.resource_id == "i-web2":
229
+ reward += self._reward_once("inspect_target", 0.2)
230
+
231
+ elif action.command == "view_logs":
232
+ if not action.resource_id:
233
+ raise ValueError("resource_id is required for view_logs.")
234
+
235
+ res = state.resources.get(action.resource_id)
236
+ if not res:
237
+ raise ValueError(f"Resource {action.resource_id} not found.")
238
+
239
+ output = str(res.get("logs", "No logs available for this resource."))
240
+
241
+ if self.task_name == "medium" and action.resource_id == "i-api":
242
+ reward += self._reward_once("read_logs", 0.2)
243
+ elif self.task_name == "hard" and action.resource_id == "lb-main":
244
+ reward += self._reward_once("inspect_lb", 0.2)
245
+ elif self.task_name == "hard" and action.resource_id == "i-web2":
246
+ reward += self._reward_once("inspect_target", 0.2)
247
+
248
+ elif action.command == "update_security_group":
249
+ if not action.resource_id:
250
+ raise ValueError("resource_id is required for update_security_group.")
251
+
252
+ res = state.resources.get(action.resource_id)
253
+ if not res or res.get("type") != "SecurityGroup":
254
+ raise ValueError(f"Invalid Security Group ID: {action.resource_id}")
255
+ if not action.parameters or "port" not in action.parameters:
256
+ raise ValueError("Missing 'port' in parameters.")
257
+
258
+ rule = copy.deepcopy(action.parameters)
259
+ rules = res.get("rules")
260
+ if not isinstance(rules, list):
261
+ raise ValueError(f"Security group {action.resource_id} has invalid rules.")
262
+ rules.append(rule)
263
+ output = f"Successfully updated {action.resource_id} with rule: {rule}"
264
+
265
+ port = int(rule["port"])
266
+ if (
267
+ self.task_name == "easy"
268
+ and action.resource_id == "sg-web"
269
+ and port == 80
270
+ ):
271
+ state.is_resolved = True
272
+ reward += 0.8
273
+ done = True
274
+ output += "\nSUCCESS: Web server is now accessible!"
275
+ elif (
276
+ self.task_name == "medium"
277
+ and action.resource_id == "sg-db"
278
+ and port == 5432
279
+ ):
280
+ if "read_logs" in self._achievements:
281
+ state.is_resolved = True
282
+ reward += 0.6
283
+ done = True
284
+ output += "\nSUCCESS: Database connection restored!"
285
+ else:
286
+ reward -= 0.1
287
+ output += (
288
+ "\nWARNING: Change applied without incident triage. "
289
+ "Inspect API logs before closing the incident."
290
+ )
291
+
292
+ elif action.command == "restart_service":
293
+ if not action.resource_id:
294
+ raise ValueError("resource_id is required for restart_service.")
295
+ if action.resource_id not in state.resources:
296
+ raise ValueError(f"Resource {action.resource_id} not found.")
297
+
298
+ output = f"Service on {action.resource_id} restarted."
299
+
300
+ if self.task_name == "hard":
301
+ if action.resource_id == "i-web2":
302
+ investigated_root_cause = (
303
+ "inspect_lb" in self._achievements
304
+ and "inspect_target" in self._achievements
305
+ )
306
+ if investigated_root_cause:
307
+ state.resources["i-web2"]["status"] = "running"
308
+ state.resources["i-web2"][
309
+ "logs"
310
+ ] = "INFO: Restart successful. Memory cleared."
311
+ state.is_resolved = True
312
+ reward += 0.8
313
+ done = True
314
+ output += "\nSUCCESS: OutOfMemory loop broken. System stable."
315
+ else:
316
+ reward -= 0.1
317
+ output += (
318
+ "\nWARNING: Restart denied by change policy. "
319
+ "Find failing upstream from lb-main and inspect i-web2 first."
320
+ )
321
+ elif action.resource_id == "i-web1":
322
+ reward -= 0.2
323
+ output += (
324
+ "\nWARNING: You restarted a healthy production server! "
325
+ "Users dropped."
326
+ )
327
+
328
+ elif action.command == "submit_solution":
329
+ if state.is_resolved:
330
+ done = True
331
+ output = "Solution verified. System is HEALTHY."
332
+ else:
333
+ if self.task_name == "hard":
334
+ # In hard mode, unresolved submission should not abort the run.
335
+ done = False
336
+ reward -= 0.1
337
+ output = (
338
+ "Solution incorrect. Incident is still CRITICAL. "
339
+ "Continue triage and remediation before submitting."
340
+ )
341
+ else:
342
+ done = True
343
+ output = "Solution incorrect. System is still CRITICAL."
344
+
345
+ else:
346
+ raise ValueError(f"Unsupported command: {action.command}")
347
+
348
+ except Exception as exc:
349
+ error = str(exc)
350
+ output = f"Command Failed: {error}"
351
+
352
+ if state.step_count >= self.MAX_STEPS and not done:
353
+ done = True
354
+ timeout_suffix = "\nTIMEOUT: Max steps reached."
355
+ output = f"{output}{timeout_suffix}" if output else timeout_suffix.strip()
356
+
357
+ reward = max(-1.0, min(1.0, reward))
358
+ status = "HEALTHY" if state.is_resolved else "CRITICAL"
359
+ info = {
360
+ "step_count": state.step_count,
361
+ "resolved": state.is_resolved,
362
+ "task": self.task_name,
363
+ "achievements": sorted(self._achievements),
364
+ "total_resources": len(state.resources),
365
+ }
366
+
367
+ return CloudObservation(
368
+ output=output,
369
+ error=error,
370
+ system_health_status=status,
371
+ done=done,
372
+ reward=reward,
373
+ metadata=info,
374
+ echoed_message=output,
375
+ message_length=len(output),
376
+ )
377
+
378
+ @property
379
+ def state(self) -> State:
380
+ """Return hidden environment state for evaluators/debugging."""
381
+ if self._state_data is None:
382
+ self.reset()
383
+ assert self._state_data is not None
384
+ return self._state_data
server/requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ openenv[core]>=0.2.0
2
+ fastapi>=0.115.0
3
+ uvicorn>=0.24.0
4
+
5
+
6
+