Spaces:
Running
Running
Prepare HF Space submission validation and compliance.
Browse files- Dockerfile +23 -0
- README.md +117 -1
- __init__.py +25 -0
- client.py +107 -0
- env.py +67 -0
- inference.py +163 -0
- models.py +113 -0
- openenv.yaml +19 -0
- pyproject.toml +36 -0
- scripts/pre_submit_validate.sh +365 -0
- scripts/validate-submission.sh +185 -0
- server/Dockerfile +80 -0
- server/__init__.py +11 -0
- server/app.py +101 -0
- server/cloud_devops_env_environment.py +384 -0
- server/requirements.txt +6 -0
Dockerfile
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Use a lightweight, stable Python image
|
| 2 |
+
FROM python:3.10-slim
|
| 3 |
+
|
| 4 |
+
# Set working directory
|
| 5 |
+
WORKDIR /app
|
| 6 |
+
|
| 7 |
+
# Copy project files
|
| 8 |
+
COPY pyproject.toml .
|
| 9 |
+
COPY openenv.yaml .
|
| 10 |
+
COPY models.py .
|
| 11 |
+
COPY env.py .
|
| 12 |
+
COPY __init__.py .
|
| 13 |
+
COPY client.py .
|
| 14 |
+
COPY server ./server
|
| 15 |
+
|
| 16 |
+
# Install dependencies (no-cache to save space)
|
| 17 |
+
RUN pip install --no-cache-dir .
|
| 18 |
+
|
| 19 |
+
# Expose the standard OpenEnv port
|
| 20 |
+
EXPOSE 8000
|
| 21 |
+
|
| 22 |
+
# Start the FastAPI/OpenEnv app directly (openenv serve is not implemented in v0.2.3)
|
| 23 |
+
CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
|
README.md
CHANGED
|
@@ -9,4 +9,120 @@ license: mit
|
|
| 9 |
short_description: Cloud SRE/DevOps RL environment
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
short_description: Cloud SRE/DevOps RL environment
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# Cloud DevOps RLEnv
|
| 13 |
+
|
| 14 |
+
Cloud DevOps RLEnv is an OpenEnv-compatible environment for training and evaluating agents on realistic cloud SRE and DevOps incident-response tasks.
|
| 15 |
+
|
| 16 |
+
## Environment Description And Motivation
|
| 17 |
+
|
| 18 |
+
Production incidents are often multi-step: triage, inspect resources, check logs, apply a safe remediation, and then verify the fix. This environment simulates that loop with deterministic scenarios and shaped rewards.
|
| 19 |
+
|
| 20 |
+
Goals:
|
| 21 |
+
- Benchmark planning and tool-use behavior for cloud operations agents.
|
| 22 |
+
- Reward correct diagnosis over blind action execution.
|
| 23 |
+
- Provide repeatable task outcomes for fair grading and comparison.
|
| 24 |
+
|
| 25 |
+
## Action Space
|
| 26 |
+
|
| 27 |
+
Action model: `CloudAction`
|
| 28 |
+
|
| 29 |
+
Fields:
|
| 30 |
+
- `command` (required): one of `list_resources`, `describe_resource`, `view_logs`, `update_security_group`, `restart_service`, `submit_solution`.
|
| 31 |
+
- `resource_id` (optional): target resource identifier (required for most non-list actions).
|
| 32 |
+
- `parameters` (optional): structured key/value arguments used by mutating actions.
|
| 33 |
+
|
| 34 |
+
Notes:
|
| 35 |
+
- `update_security_group` expects `parameters.port` and usually `parameters.action`.
|
| 36 |
+
- `restart_service` targets a single instance by `resource_id`.
|
| 37 |
+
|
| 38 |
+
## Observation And State Space
|
| 39 |
+
|
| 40 |
+
Observation model: `CloudObservation`
|
| 41 |
+
|
| 42 |
+
Primary observation fields:
|
| 43 |
+
- `output`: command result payload.
|
| 44 |
+
- `error`: command error, when present.
|
| 45 |
+
- `system_health_status`: `CRITICAL`, `DEGRADED`, or `HEALTHY`.
|
| 46 |
+
- `done`: terminal flag.
|
| 47 |
+
- `reward`: scalar step reward.
|
| 48 |
+
- `metadata`: includes task name, resolution status, step count, and other diagnostics.
|
| 49 |
+
|
| 50 |
+
Hidden state model: `CloudState`
|
| 51 |
+
- `task_difficulty`: `easy`, `medium`, or `hard`.
|
| 52 |
+
- `resources`: underlying resource graph and logs.
|
| 53 |
+
- `step_count`: total actions issued.
|
| 54 |
+
- `is_resolved`: whether incident root cause is remediated.
|
| 55 |
+
|
| 56 |
+
## Task Definitions And Expected Difficulty
|
| 57 |
+
|
| 58 |
+
- `easy`:
|
| 59 |
+
Open port `80` on `sg-web` so web traffic can flow.
|
| 60 |
+
Expected difficulty: low.
|
| 61 |
+
- `medium`:
|
| 62 |
+
Inspect API logs to identify DB connectivity failure, then open port `5432` on `sg-db`.
|
| 63 |
+
Expected difficulty: medium (requires diagnosis before remediation).
|
| 64 |
+
- `hard`:
|
| 65 |
+
Trace load balancer timeout to `i-web2`, inspect the target, then restart the correct service.
|
| 66 |
+
Expected difficulty: high (multi-hop diagnosis and anti-shortcut checks).
|
| 67 |
+
|
| 68 |
+
## Setup And Usage
|
| 69 |
+
|
| 70 |
+
From repository root:
|
| 71 |
+
|
| 72 |
+
```bash
|
| 73 |
+
# Validate OpenEnv package structure and manifest
|
| 74 |
+
..\\.venv\\Scripts\\openenv validate
|
| 75 |
+
|
| 76 |
+
# Run pre-submission validator (skip live inference)
|
| 77 |
+
bash scripts/pre_submit_validate.sh --skip-inference
|
| 78 |
+
|
| 79 |
+
# Build local submission image
|
| 80 |
+
docker build -t cloud-devops-env:phase1 -f Dockerfile .
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
Optional local server run:
|
| 84 |
+
|
| 85 |
+
```bash
|
| 86 |
+
uvicorn server.app:app --host 0.0.0.0 --port 8000
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
## Inference Contract
|
| 90 |
+
|
| 91 |
+
`inference.py` uses the OpenAI client and reads the following environment variables:
|
| 92 |
+
- `API_BASE_URL`
|
| 93 |
+
- `MODEL_NAME`
|
| 94 |
+
- `HF_TOKEN`
|
| 95 |
+
|
| 96 |
+
It emits strict structured logs:
|
| 97 |
+
- `[START] { ... }` per task
|
| 98 |
+
- `[STEP] { ... }` per environment action
|
| 99 |
+
- `[END] { ... }` per task summary
|
| 100 |
+
|
| 101 |
+
## Baseline Scores
|
| 102 |
+
|
| 103 |
+
Representative deterministic scripted-policy targets:
|
| 104 |
+
|
| 105 |
+
| Task | Baseline Score (0-1) | Notes |
|
| 106 |
+
| --- | --- | --- |
|
| 107 |
+
| easy | 1.0 | Includes identifying and fixing security group rule |
|
| 108 |
+
| medium | 0.8-1.0 | Depends on whether optional diagnostic reward is collected |
|
| 109 |
+
| hard | 1.0 | Requires correct root-cause path before restart |
|
| 110 |
+
|
| 111 |
+
Validation expectation:
|
| 112 |
+
- Aggregate scores are clamped to `[0.0, 1.0]`.
|
| 113 |
+
- `SUCCESS_SCORE_THRESHOLD` for inference summaries is `0.8`.
|
| 114 |
+
|
| 115 |
+
## Hugging Face Space Deployment
|
| 116 |
+
|
| 117 |
+
1. Push this repository to your Space (Docker SDK).
|
| 118 |
+
2. Ensure `README.md` front matter (above) is present.
|
| 119 |
+
3. Set Space secrets/variables:
|
| 120 |
+
- `HF_TOKEN` (secret)
|
| 121 |
+
- `API_BASE_URL` (for example `https://router.huggingface.co/v1`)
|
| 122 |
+
- `MODEL_NAME` (chosen model slug)
|
| 123 |
+
4. Wait for Space build to complete.
|
| 124 |
+
5. Verify endpoints:
|
| 125 |
+
- `GET /health` returns `200`
|
| 126 |
+
- `POST /reset` returns `200`
|
| 127 |
+
|
| 128 |
+
Reference: https://huggingface.co/docs/hub/spaces-config-reference
|
__init__.py
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
"""Cloud Devops Env Environment."""
|
| 8 |
+
|
| 9 |
+
from .client import CloudDevopsEnv
|
| 10 |
+
from .models import (
|
| 11 |
+
CloudAction,
|
| 12 |
+
CloudDevopsAction,
|
| 13 |
+
CloudDevopsObservation,
|
| 14 |
+
CloudObservation,
|
| 15 |
+
CloudState,
|
| 16 |
+
)
|
| 17 |
+
|
| 18 |
+
__all__ = [
|
| 19 |
+
"CloudAction",
|
| 20 |
+
"CloudObservation",
|
| 21 |
+
"CloudState",
|
| 22 |
+
"CloudDevopsAction",
|
| 23 |
+
"CloudDevopsObservation",
|
| 24 |
+
"CloudDevopsEnv",
|
| 25 |
+
]
|
client.py
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
"""Cloud Devops Env Environment Client."""
|
| 8 |
+
|
| 9 |
+
from typing import Any, Dict
|
| 10 |
+
|
| 11 |
+
from openenv.core import EnvClient
|
| 12 |
+
from openenv.core.client_types import StepResult
|
| 13 |
+
from openenv.core.env_server.types import State
|
| 14 |
+
|
| 15 |
+
from .models import CloudAction, CloudObservation
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
class CloudDevopsEnv(
|
| 19 |
+
EnvClient[CloudAction, CloudObservation, State]
|
| 20 |
+
):
|
| 21 |
+
"""
|
| 22 |
+
Client for the Cloud Devops Env Environment.
|
| 23 |
+
|
| 24 |
+
This client maintains a persistent WebSocket connection to the environment server,
|
| 25 |
+
enabling efficient multi-step interactions with lower latency.
|
| 26 |
+
Each client instance has its own dedicated environment session on the server.
|
| 27 |
+
|
| 28 |
+
Example:
|
| 29 |
+
>>> # Connect to a running server
|
| 30 |
+
>>> with CloudDevopsEnv(base_url="http://localhost:8000") as client:
|
| 31 |
+
... result = client.reset()
|
| 32 |
+
... print(result.observation.system_health_status)
|
| 33 |
+
...
|
| 34 |
+
... result = client.step(CloudAction(command="list_resources"))
|
| 35 |
+
... print(result.observation.output)
|
| 36 |
+
|
| 37 |
+
Example with Docker:
|
| 38 |
+
>>> # Automatically start container and connect
|
| 39 |
+
>>> client = CloudDevopsEnv.from_docker_image("cloud_devops_env-env:latest")
|
| 40 |
+
>>> try:
|
| 41 |
+
... result = client.reset()
|
| 42 |
+
... result = client.step(CloudAction(command="list_resources"))
|
| 43 |
+
... finally:
|
| 44 |
+
... client.close()
|
| 45 |
+
"""
|
| 46 |
+
|
| 47 |
+
def _step_payload(self, action: CloudAction) -> Dict[str, Any]:
|
| 48 |
+
"""
|
| 49 |
+
Convert CloudAction to JSON payload for step message.
|
| 50 |
+
|
| 51 |
+
Args:
|
| 52 |
+
action: CloudAction instance
|
| 53 |
+
|
| 54 |
+
Returns:
|
| 55 |
+
Dictionary representation suitable for JSON encoding
|
| 56 |
+
"""
|
| 57 |
+
payload: Dict[str, Any] = {
|
| 58 |
+
"command": action.command,
|
| 59 |
+
"resource_id": action.resource_id,
|
| 60 |
+
"parameters": action.parameters,
|
| 61 |
+
}
|
| 62 |
+
if action.message is not None:
|
| 63 |
+
payload["message"] = action.message
|
| 64 |
+
return payload
|
| 65 |
+
|
| 66 |
+
def _parse_result(self, payload: Dict[str, Any]) -> StepResult[CloudObservation]:
|
| 67 |
+
"""
|
| 68 |
+
Parse server response into StepResult[CloudObservation].
|
| 69 |
+
|
| 70 |
+
Args:
|
| 71 |
+
payload: JSON response data from server
|
| 72 |
+
|
| 73 |
+
Returns:
|
| 74 |
+
StepResult with CloudObservation
|
| 75 |
+
"""
|
| 76 |
+
obs_data = payload.get("observation", {})
|
| 77 |
+
observation = CloudObservation(
|
| 78 |
+
output=obs_data.get("output", ""),
|
| 79 |
+
error=obs_data.get("error"),
|
| 80 |
+
system_health_status=obs_data.get("system_health_status", "CRITICAL"),
|
| 81 |
+
message_length=obs_data.get("message_length", 0),
|
| 82 |
+
echoed_message=obs_data.get("echoed_message"),
|
| 83 |
+
done=payload.get("done", False),
|
| 84 |
+
reward=payload.get("reward"),
|
| 85 |
+
metadata=obs_data.get("metadata", {}),
|
| 86 |
+
)
|
| 87 |
+
|
| 88 |
+
return StepResult(
|
| 89 |
+
observation=observation,
|
| 90 |
+
reward=payload.get("reward"),
|
| 91 |
+
done=payload.get("done", False),
|
| 92 |
+
)
|
| 93 |
+
|
| 94 |
+
def _parse_state(self, payload: Dict[str, Any]) -> State:
|
| 95 |
+
"""
|
| 96 |
+
Parse server response into State object.
|
| 97 |
+
|
| 98 |
+
Args:
|
| 99 |
+
payload: JSON response from state request
|
| 100 |
+
|
| 101 |
+
Returns:
|
| 102 |
+
State object with episode_id and step_count
|
| 103 |
+
"""
|
| 104 |
+
return State(
|
| 105 |
+
episode_id=payload.get("episode_id"),
|
| 106 |
+
step_count=payload.get("step_count", 0),
|
| 107 |
+
)
|
env.py
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
"""Async entrypoint wrapper for external evaluators and custom graders."""
|
| 8 |
+
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
from typing import Any, Dict
|
| 12 |
+
|
| 13 |
+
from pydantic import BaseModel
|
| 14 |
+
|
| 15 |
+
try:
|
| 16 |
+
from .models import CloudAction, CloudObservation, CloudState
|
| 17 |
+
from .server.cloud_devops_env_environment import CloudDevopsEnvironment
|
| 18 |
+
except ImportError:
|
| 19 |
+
from models import CloudAction, CloudObservation, CloudState
|
| 20 |
+
from server.cloud_devops_env_environment import CloudDevopsEnvironment
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
class EnvResult(BaseModel):
|
| 24 |
+
"""Canonical environment result payload for async evaluator loops."""
|
| 25 |
+
|
| 26 |
+
observation: CloudObservation
|
| 27 |
+
reward: float
|
| 28 |
+
done: bool
|
| 29 |
+
info: Dict[str, Any]
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
class CloudDevOpsEnv:
|
| 33 |
+
"""Async-compatible facade over the OpenEnv server-side environment logic."""
|
| 34 |
+
|
| 35 |
+
def __init__(self, task_name: str = "easy"):
|
| 36 |
+
self._impl = CloudDevopsEnvironment(task_name=task_name)
|
| 37 |
+
|
| 38 |
+
@property
|
| 39 |
+
def achievements(self) -> set[str]:
|
| 40 |
+
"""Expose completed shaped-reward checkpoints for debugging/evaluation."""
|
| 41 |
+
return set(self._impl._achievements)
|
| 42 |
+
|
| 43 |
+
async def reset(self) -> EnvResult:
|
| 44 |
+
"""Reset the environment to the initial task state."""
|
| 45 |
+
obs = self._impl.reset()
|
| 46 |
+
return EnvResult(
|
| 47 |
+
observation=obs,
|
| 48 |
+
reward=float(obs.reward or 0.0),
|
| 49 |
+
done=bool(obs.done),
|
| 50 |
+
info=dict(obs.metadata or {}),
|
| 51 |
+
)
|
| 52 |
+
|
| 53 |
+
async def step(self, action: CloudAction) -> EnvResult:
|
| 54 |
+
"""Execute an action and return a structured async result."""
|
| 55 |
+
obs = self._impl.step(action)
|
| 56 |
+
return EnvResult(
|
| 57 |
+
observation=obs,
|
| 58 |
+
reward=float(obs.reward or 0.0),
|
| 59 |
+
done=bool(obs.done),
|
| 60 |
+
info=dict(obs.metadata or {}),
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
async def state(self) -> CloudState:
|
| 64 |
+
"""Return hidden environment state for deterministic evaluators."""
|
| 65 |
+
state = self._impl.state
|
| 66 |
+
assert isinstance(state, CloudState)
|
| 67 |
+
return state
|
inference.py
ADDED
|
@@ -0,0 +1,163 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import asyncio
|
| 2 |
+
import json
|
| 3 |
+
import os
|
| 4 |
+
from typing import Any, Dict, List, Tuple
|
| 5 |
+
|
| 6 |
+
from openai import OpenAI
|
| 7 |
+
from pydantic import ValidationError
|
| 8 |
+
|
| 9 |
+
from env import CloudDevOpsEnv
|
| 10 |
+
from models import CloudAction
|
| 11 |
+
|
| 12 |
+
API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
|
| 13 |
+
MODEL_NAME = os.getenv("MODEL_NAME", "google/gemma-4-31B-it")
|
| 14 |
+
HF_TOKEN = os.getenv("HF_TOKEN")
|
| 15 |
+
|
| 16 |
+
BENCHMARK = "CloudDevOpsEnv"
|
| 17 |
+
MAX_STEPS = 15
|
| 18 |
+
MAX_TOTAL_REWARD = 1.0
|
| 19 |
+
SUCCESS_SCORE_THRESHOLD = 0.8
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def log_start(task: str, env: str, model: str) -> None:
|
| 23 |
+
log_data = {"task": task, "env": env, "model": model}
|
| 24 |
+
print(f"[START] {json.dumps(log_data)}", flush=True)
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def log_step(step: int, action: Any, reward: float, done: bool, error: Any) -> None:
|
| 28 |
+
action_dict = action.model_dump() if hasattr(action, "model_dump") else str(action)
|
| 29 |
+
log_data = {
|
| 30 |
+
"step": step,
|
| 31 |
+
"action": action_dict,
|
| 32 |
+
"reward": reward,
|
| 33 |
+
"done": done,
|
| 34 |
+
"error": error,
|
| 35 |
+
}
|
| 36 |
+
print(f"[STEP] {json.dumps(log_data)}", flush=True)
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
|
| 40 |
+
log_data = {"success": success, "steps": steps, "score": score, "rewards": rewards}
|
| 41 |
+
print(f"[END] {json.dumps(log_data)}", flush=True)
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def get_model_action(
|
| 45 |
+
client: OpenAI,
|
| 46 |
+
step: int,
|
| 47 |
+
last_obs: str,
|
| 48 |
+
last_error: str,
|
| 49 |
+
history: List[Dict[str, str]],
|
| 50 |
+
) -> Tuple[CloudAction, str]:
|
| 51 |
+
"""Prompt the LLM and parse its response into a CloudAction."""
|
| 52 |
+
system_prompt = (
|
| 53 |
+
"You are an expert AI DevOps Engineer diagnosing a cloud infrastructure issue. "
|
| 54 |
+
"You must respond ONLY with a raw JSON object matching this schema:\n"
|
| 55 |
+
"{\n"
|
| 56 |
+
' "command": "list_resources" | "describe_resource" | "view_logs" | "update_security_group" | "restart_service" | "submit_solution",\n'
|
| 57 |
+
' "resource_id": "string (optional)",\n'
|
| 58 |
+
' "parameters": {"key": "value"} (optional)\n'
|
| 59 |
+
"}\n"
|
| 60 |
+
"Do not include markdown blocks like ```json. Just output the JSON."
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
user_prompt = f"Step {step}.\nLast Observation:\n{last_obs}\n"
|
| 64 |
+
if last_error:
|
| 65 |
+
user_prompt += f"\nLast Error:\n{last_error}\n"
|
| 66 |
+
user_prompt += "\nWhat is your next action JSON?"
|
| 67 |
+
|
| 68 |
+
messages = [{"role": "system", "content": system_prompt}] + history + [
|
| 69 |
+
{"role": "user", "content": user_prompt}
|
| 70 |
+
]
|
| 71 |
+
|
| 72 |
+
try:
|
| 73 |
+
response = client.chat.completions.create(
|
| 74 |
+
model=MODEL_NAME,
|
| 75 |
+
messages=messages,
|
| 76 |
+
temperature=0.1,
|
| 77 |
+
max_tokens=200,
|
| 78 |
+
)
|
| 79 |
+
raw_text = (response.choices[0].message.content or "").strip()
|
| 80 |
+
|
| 81 |
+
if raw_text.startswith("```json"):
|
| 82 |
+
raw_text = raw_text.replace("```json", "").replace("```", "").strip()
|
| 83 |
+
|
| 84 |
+
action_dict = json.loads(raw_text)
|
| 85 |
+
return CloudAction(**action_dict), raw_text
|
| 86 |
+
except (json.JSONDecodeError, ValidationError) as exc:
|
| 87 |
+
print(f"[DEBUG] Model parse failed: {exc}", flush=True)
|
| 88 |
+
return CloudAction(command="list_resources"), "failed_parse"
|
| 89 |
+
except Exception as exc:
|
| 90 |
+
print(f"[DEBUG] API request failed: {exc}", flush=True)
|
| 91 |
+
return CloudAction(command="list_resources"), "api_error"
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
async def run_task(task_name: str, client: OpenAI) -> None:
|
| 95 |
+
env = CloudDevOpsEnv(task_name=task_name)
|
| 96 |
+
|
| 97 |
+
history: List[Dict[str, str]] = []
|
| 98 |
+
rewards: List[float] = []
|
| 99 |
+
steps_taken = 0
|
| 100 |
+
score = 0.0
|
| 101 |
+
success = False
|
| 102 |
+
|
| 103 |
+
log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
|
| 104 |
+
|
| 105 |
+
try:
|
| 106 |
+
result = await env.reset()
|
| 107 |
+
last_obs = result.observation.output
|
| 108 |
+
last_error = result.observation.error or ""
|
| 109 |
+
|
| 110 |
+
for step in range(1, MAX_STEPS + 1):
|
| 111 |
+
if result.done:
|
| 112 |
+
break
|
| 113 |
+
|
| 114 |
+
action, raw_response = get_model_action(
|
| 115 |
+
client, step, last_obs, last_error, history
|
| 116 |
+
)
|
| 117 |
+
|
| 118 |
+
result = await env.step(action)
|
| 119 |
+
obs = result.observation
|
| 120 |
+
reward = result.reward or 0.0
|
| 121 |
+
done = result.done
|
| 122 |
+
error = obs.error
|
| 123 |
+
|
| 124 |
+
rewards.append(reward)
|
| 125 |
+
steps_taken = step
|
| 126 |
+
last_obs = obs.output
|
| 127 |
+
last_error = error or ""
|
| 128 |
+
|
| 129 |
+
log_step(step=step, action=action, reward=reward, done=done, error=error)
|
| 130 |
+
|
| 131 |
+
history.append({"role": "assistant", "content": raw_response})
|
| 132 |
+
history.append(
|
| 133 |
+
{
|
| 134 |
+
"role": "user",
|
| 135 |
+
"content": f"Observation: {last_obs}\nError: {last_error}",
|
| 136 |
+
}
|
| 137 |
+
)
|
| 138 |
+
|
| 139 |
+
if done:
|
| 140 |
+
break
|
| 141 |
+
|
| 142 |
+
score = sum(rewards)
|
| 143 |
+
score = min(max(score, 0.0), MAX_TOTAL_REWARD)
|
| 144 |
+
success = score >= SUCCESS_SCORE_THRESHOLD
|
| 145 |
+
|
| 146 |
+
finally:
|
| 147 |
+
log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
async def main() -> None:
|
| 151 |
+
if not HF_TOKEN:
|
| 152 |
+
print("[WARNING] HF_TOKEN environment variable not set. API calls will likely fail.")
|
| 153 |
+
|
| 154 |
+
client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
|
| 155 |
+
|
| 156 |
+
tasks = ["easy", "medium", "hard"]
|
| 157 |
+
for task in tasks:
|
| 158 |
+
print(f"\n--- Running Task: {task.upper()} ---")
|
| 159 |
+
await run_task(task, client)
|
| 160 |
+
|
| 161 |
+
|
| 162 |
+
if __name__ == "__main__":
|
| 163 |
+
asyncio.run(main())
|
models.py
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
"""
|
| 8 |
+
Data models for the Cloud Devops Env Environment.
|
| 9 |
+
|
| 10 |
+
The cloud_devops_env environment simulates cloud/devops incident response tasks.
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
import json
|
| 14 |
+
from typing import Any, Dict, Literal, Optional
|
| 15 |
+
|
| 16 |
+
from openenv.core.env_server.types import Action, Observation, State
|
| 17 |
+
from pydantic import Field, field_validator
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class CloudAction(Action):
|
| 21 |
+
"""Action space (what the agent can do)."""
|
| 22 |
+
|
| 23 |
+
command: Literal[
|
| 24 |
+
"list_resources",
|
| 25 |
+
"describe_resource",
|
| 26 |
+
"view_logs",
|
| 27 |
+
"update_security_group",
|
| 28 |
+
"restart_service",
|
| 29 |
+
"submit_solution",
|
| 30 |
+
] = Field(..., description="The cloud API command to execute.")
|
| 31 |
+
resource_id: Optional[str] = Field(
|
| 32 |
+
default=None,
|
| 33 |
+
description=(
|
| 34 |
+
"The ID of the target resource (e.g., 'i-12345'). "
|
| 35 |
+
"Required for all commands except list_resources."
|
| 36 |
+
),
|
| 37 |
+
)
|
| 38 |
+
parameters: Optional[Dict[str, Any]] = Field(
|
| 39 |
+
default=None,
|
| 40 |
+
description=(
|
| 41 |
+
"Key-value pairs for updates "
|
| 42 |
+
"(e.g., {'port': '80', 'action': 'allow'} for update_security_group)."
|
| 43 |
+
),
|
| 44 |
+
)
|
| 45 |
+
message: Optional[str] = Field(
|
| 46 |
+
default=None,
|
| 47 |
+
description="Legacy field from template env; safe to remove after server/client migration.",
|
| 48 |
+
)
|
| 49 |
+
|
| 50 |
+
@field_validator("parameters", mode="before")
|
| 51 |
+
@classmethod
|
| 52 |
+
def _coerce_parameters(cls, value: Any) -> Any:
|
| 53 |
+
"""Allow /web text input to pass JSON for dict parameters."""
|
| 54 |
+
if value is None or value == "":
|
| 55 |
+
return None
|
| 56 |
+
if isinstance(value, dict):
|
| 57 |
+
return value
|
| 58 |
+
if isinstance(value, str):
|
| 59 |
+
try:
|
| 60 |
+
parsed = json.loads(value)
|
| 61 |
+
except json.JSONDecodeError as exc:
|
| 62 |
+
raise ValueError(
|
| 63 |
+
"parameters must be a JSON object string, e.g. {\"port\":80,\"action\":\"allow\"}"
|
| 64 |
+
) from exc
|
| 65 |
+
if not isinstance(parsed, dict):
|
| 66 |
+
raise ValueError("parameters JSON must decode to an object/dictionary")
|
| 67 |
+
return parsed
|
| 68 |
+
raise ValueError("parameters must be a dictionary or JSON object string")
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
class CloudObservation(Observation):
|
| 72 |
+
"""Observation space (what the agent sees)."""
|
| 73 |
+
|
| 74 |
+
output: str = Field(
|
| 75 |
+
...,
|
| 76 |
+
description="The terminal/API response from the last command executed.",
|
| 77 |
+
)
|
| 78 |
+
error: Optional[str] = Field(
|
| 79 |
+
default=None,
|
| 80 |
+
description="Error message if the last command failed or was invalid.",
|
| 81 |
+
)
|
| 82 |
+
system_health_status: str = Field(
|
| 83 |
+
...,
|
| 84 |
+
description="Current status of the system (e.g., 'CRITICAL', 'DEGRADED', 'HEALTHY').",
|
| 85 |
+
)
|
| 86 |
+
echoed_message: Optional[str] = Field(
|
| 87 |
+
default=None,
|
| 88 |
+
description="Legacy field from template env; safe to remove after server/client migration.",
|
| 89 |
+
)
|
| 90 |
+
message_length: int = Field(
|
| 91 |
+
default=0,
|
| 92 |
+
description="Legacy field from template env; safe to remove after server/client migration.",
|
| 93 |
+
)
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
class CloudState(State):
|
| 97 |
+
"""State space (the hidden environment state)."""
|
| 98 |
+
|
| 99 |
+
task_difficulty: str = Field(..., description="Current task: easy, medium, or hard.")
|
| 100 |
+
resources: Dict[str, Dict[str, Any]] = Field(
|
| 101 |
+
...,
|
| 102 |
+
description="The hidden JSON state of all mock cloud resources.",
|
| 103 |
+
)
|
| 104 |
+
step_count: int = Field(..., description="Number of actions taken so far.")
|
| 105 |
+
is_resolved: bool = Field(
|
| 106 |
+
...,
|
| 107 |
+
description="Whether the root cause has been successfully fixed.",
|
| 108 |
+
)
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
# Backward-compatible aliases for scaffolded files that still use template names.
|
| 112 |
+
CloudDevopsAction = CloudAction
|
| 113 |
+
CloudDevopsObservation = CloudObservation
|
openenv.yaml
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
spec_version: 1
|
| 2 |
+
name: cloud_devops_env
|
| 3 |
+
type: space
|
| 4 |
+
runtime: fastapi
|
| 5 |
+
app: server.app:app
|
| 6 |
+
port: 8000
|
| 7 |
+
|
| 8 |
+
metadata:
|
| 9 |
+
project: cloud-devops-env
|
| 10 |
+
description: A real-world Cloud SRE/DevOps simulation environment.
|
| 11 |
+
entrypoint:
|
| 12 |
+
file: env.py
|
| 13 |
+
class: CloudDevOpsEnv
|
| 14 |
+
models:
|
| 15 |
+
file: models.py
|
| 16 |
+
action: CloudAction
|
| 17 |
+
observation: CloudObservation
|
| 18 |
+
state: CloudState
|
| 19 |
+
|
pyproject.toml
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
[build-system]
|
| 8 |
+
requires = ["setuptools>=45", "wheel"]
|
| 9 |
+
build-backend = "setuptools.build_meta"
|
| 10 |
+
|
| 11 |
+
[project]
|
| 12 |
+
name = "openenv-cloud_devops_env"
|
| 13 |
+
version = "0.1.0"
|
| 14 |
+
description = "Cloud Devops Env environment for OpenEnv"
|
| 15 |
+
requires-python = ">=3.10"
|
| 16 |
+
dependencies = [
|
| 17 |
+
"openenv-core[core]>=0.2.2",
|
| 18 |
+
"pydantic>=2.0.0",
|
| 19 |
+
"openai>=1.0.0",
|
| 20 |
+
]
|
| 21 |
+
|
| 22 |
+
[project.optional-dependencies]
|
| 23 |
+
dev = [
|
| 24 |
+
"pytest>=8.0.0",
|
| 25 |
+
"pytest-cov>=4.0.0",
|
| 26 |
+
]
|
| 27 |
+
|
| 28 |
+
[project.scripts]
|
| 29 |
+
# Server entry point - enables running via: uv run --project . server
|
| 30 |
+
# or: python -m cloud_devops_env.server.app
|
| 31 |
+
server = "cloud_devops_env.server.app:main"
|
| 32 |
+
|
| 33 |
+
[tool.setuptools]
|
| 34 |
+
include-package-data = true
|
| 35 |
+
packages = ["cloud_devops_env", "cloud_devops_env.server"]
|
| 36 |
+
package-dir = { "cloud_devops_env" = ".", "cloud_devops_env.server" = "server" }
|
scripts/pre_submit_validate.sh
ADDED
|
@@ -0,0 +1,365 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
#
|
| 3 |
+
# pre_submit_validate.sh
|
| 4 |
+
#
|
| 5 |
+
# Extended pre-submission checks for OpenEnv hackathon submissions.
|
| 6 |
+
# This script complements scripts/validate-submission.sh by also checking
|
| 7 |
+
# inference contract requirements and baseline reproducibility.
|
| 8 |
+
|
| 9 |
+
set -euo pipefail
|
| 10 |
+
|
| 11 |
+
DOCKER_BUILD_TIMEOUT=600
|
| 12 |
+
INFERENCE_TIMEOUT=1200
|
| 13 |
+
|
| 14 |
+
PING_URL=""
|
| 15 |
+
REPO_DIR="."
|
| 16 |
+
SKIP_DOCKER=false
|
| 17 |
+
SKIP_INFERENCE=false
|
| 18 |
+
PYTHON_BIN=""
|
| 19 |
+
OPENENV_BIN=""
|
| 20 |
+
OPENENV_USE_MODULE=false
|
| 21 |
+
DOCKER_CONTAINER_ID=""
|
| 22 |
+
|
| 23 |
+
usage() {
|
| 24 |
+
cat <<'EOF'
|
| 25 |
+
Usage: scripts/pre_submit_validate.sh [options]
|
| 26 |
+
|
| 27 |
+
Options:
|
| 28 |
+
--ping-url <url> HF Space URL (e.g., https://team-space.hf.space)
|
| 29 |
+
--repo-dir <path> Repo root directory (default: current directory)
|
| 30 |
+
--skip-docker Skip docker build check
|
| 31 |
+
--skip-inference Skip inference baseline check
|
| 32 |
+
-h, --help Show this help message
|
| 33 |
+
|
| 34 |
+
Required environment variables for inference checks:
|
| 35 |
+
API_BASE_URL
|
| 36 |
+
MODEL_NAME
|
| 37 |
+
HF_TOKEN
|
| 38 |
+
EOF
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
run_with_timeout() {
|
| 42 |
+
local secs="$1"; shift
|
| 43 |
+
if command -v timeout >/dev/null 2>&1; then
|
| 44 |
+
timeout "$secs" "$@"
|
| 45 |
+
elif command -v gtimeout >/dev/null 2>&1; then
|
| 46 |
+
gtimeout "$secs" "$@"
|
| 47 |
+
else
|
| 48 |
+
"$@" &
|
| 49 |
+
local pid=$!
|
| 50 |
+
( sleep "$secs" && kill "$pid" 2>/dev/null ) &
|
| 51 |
+
local watcher=$!
|
| 52 |
+
wait "$pid" 2>/dev/null
|
| 53 |
+
local rc=$?
|
| 54 |
+
kill "$watcher" 2>/dev/null || true
|
| 55 |
+
wait "$watcher" 2>/dev/null || true
|
| 56 |
+
return $rc
|
| 57 |
+
fi
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
log() {
|
| 61 |
+
printf "[%s] %s\n" "$(date -u +%H:%M:%S)" "$*"
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
die() {
|
| 65 |
+
log "FAILED -- $*"
|
| 66 |
+
exit 1
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
pass() {
|
| 70 |
+
log "PASSED -- $*"
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
cleanup() {
|
| 74 |
+
if [ -n "$DOCKER_CONTAINER_ID" ]; then
|
| 75 |
+
docker rm -f "$DOCKER_CONTAINER_ID" >/dev/null 2>&1 || true
|
| 76 |
+
fi
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
trap cleanup EXIT
|
| 80 |
+
|
| 81 |
+
resolve_python_bin() {
|
| 82 |
+
local candidates=(
|
| 83 |
+
"$REPO_DIR/.venv/bin/python"
|
| 84 |
+
"$REPO_DIR/.venv/Scripts/python.exe"
|
| 85 |
+
"$REPO_DIR/../.venv/bin/python"
|
| 86 |
+
"$REPO_DIR/../.venv/Scripts/python.exe"
|
| 87 |
+
)
|
| 88 |
+
|
| 89 |
+
for c in "${candidates[@]}"; do
|
| 90 |
+
if [ -x "$c" ]; then
|
| 91 |
+
PYTHON_BIN="$c"
|
| 92 |
+
return 0
|
| 93 |
+
fi
|
| 94 |
+
done
|
| 95 |
+
|
| 96 |
+
if command -v python >/dev/null 2>&1; then
|
| 97 |
+
PYTHON_BIN="$(command -v python)"
|
| 98 |
+
return 0
|
| 99 |
+
fi
|
| 100 |
+
if command -v python3 >/dev/null 2>&1; then
|
| 101 |
+
PYTHON_BIN="$(command -v python3)"
|
| 102 |
+
return 0
|
| 103 |
+
fi
|
| 104 |
+
|
| 105 |
+
return 1
|
| 106 |
+
}
|
| 107 |
+
|
| 108 |
+
resolve_openenv_cmd() {
|
| 109 |
+
local candidates=(
|
| 110 |
+
"$REPO_DIR/.venv/bin/openenv"
|
| 111 |
+
"$REPO_DIR/.venv/Scripts/openenv.exe"
|
| 112 |
+
"$REPO_DIR/../.venv/bin/openenv"
|
| 113 |
+
"$REPO_DIR/../.venv/Scripts/openenv.exe"
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
for c in "${candidates[@]}"; do
|
| 117 |
+
if [ -x "$c" ]; then
|
| 118 |
+
OPENENV_BIN="$c"
|
| 119 |
+
return 0
|
| 120 |
+
fi
|
| 121 |
+
done
|
| 122 |
+
|
| 123 |
+
if command -v openenv >/dev/null 2>&1; then
|
| 124 |
+
OPENENV_BIN="$(command -v openenv)"
|
| 125 |
+
return 0
|
| 126 |
+
fi
|
| 127 |
+
|
| 128 |
+
return 1
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
while [ "$#" -gt 0 ]; do
|
| 132 |
+
case "$1" in
|
| 133 |
+
--ping-url)
|
| 134 |
+
shift
|
| 135 |
+
[ "$#" -gt 0 ] || die "--ping-url requires a value"
|
| 136 |
+
PING_URL="$1"
|
| 137 |
+
;;
|
| 138 |
+
--repo-dir)
|
| 139 |
+
shift
|
| 140 |
+
[ "$#" -gt 0 ] || die "--repo-dir requires a value"
|
| 141 |
+
REPO_DIR="$1"
|
| 142 |
+
;;
|
| 143 |
+
--skip-docker)
|
| 144 |
+
SKIP_DOCKER=true
|
| 145 |
+
;;
|
| 146 |
+
--skip-inference)
|
| 147 |
+
SKIP_INFERENCE=true
|
| 148 |
+
;;
|
| 149 |
+
-h|--help)
|
| 150 |
+
usage
|
| 151 |
+
exit 0
|
| 152 |
+
;;
|
| 153 |
+
*)
|
| 154 |
+
die "Unknown option: $1"
|
| 155 |
+
;;
|
| 156 |
+
esac
|
| 157 |
+
shift
|
| 158 |
+
done
|
| 159 |
+
|
| 160 |
+
REPO_DIR="$(cd "$REPO_DIR" && pwd)"
|
| 161 |
+
cd "$REPO_DIR"
|
| 162 |
+
|
| 163 |
+
log "Repo: $REPO_DIR"
|
| 164 |
+
|
| 165 |
+
resolve_python_bin || die "No usable Python interpreter found"
|
| 166 |
+
log "Python: $PYTHON_BIN"
|
| 167 |
+
|
| 168 |
+
if resolve_openenv_cmd; then
|
| 169 |
+
log "OpenEnv CLI: $OPENENV_BIN"
|
| 170 |
+
else
|
| 171 |
+
OPENENV_USE_MODULE=true
|
| 172 |
+
log "OpenEnv CLI via module: $PYTHON_BIN -m openenv"
|
| 173 |
+
fi
|
| 174 |
+
|
| 175 |
+
log "Step 1/8: Checking OpenEnv standard file layout"
|
| 176 |
+
required_files=(
|
| 177 |
+
"openenv.yaml"
|
| 178 |
+
"models.py"
|
| 179 |
+
"env.py"
|
| 180 |
+
"inference.py"
|
| 181 |
+
"server/app.py"
|
| 182 |
+
"server/cloud_devops_env_environment.py"
|
| 183 |
+
)
|
| 184 |
+
for f in "${required_files[@]}"; do
|
| 185 |
+
[ -f "$f" ] || die "Missing required file: $f"
|
| 186 |
+
done
|
| 187 |
+
pass "Core OpenEnv file layout looks valid"
|
| 188 |
+
|
| 189 |
+
log "Step 2/8: Checking inference contract requirements"
|
| 190 |
+
[ -f "inference.py" ] || die "inference.py must exist in repo root"
|
| 191 |
+
grep -q "from openai import OpenAI" inference.py || die "inference.py must import OpenAI client"
|
| 192 |
+
grep -q "OpenAI(" inference.py || die "inference.py must instantiate OpenAI client"
|
| 193 |
+
grep -q "\[START\]" inference.py || die "inference.py must emit [START] logs"
|
| 194 |
+
grep -q "\[STEP\]" inference.py || die "inference.py must emit [STEP] logs"
|
| 195 |
+
grep -q "\[END\]" inference.py || die "inference.py must emit [END] logs"
|
| 196 |
+
pass "Inference script contract checks passed"
|
| 197 |
+
|
| 198 |
+
log "Step 3/8: Validating OpenEnv manifest and typed models"
|
| 199 |
+
if [ "$OPENENV_USE_MODULE" = true ]; then
|
| 200 |
+
"$PYTHON_BIN" -m openenv validate >/tmp/openenv-validate.out 2>&1 || {
|
| 201 |
+
cat /tmp/openenv-validate.out
|
| 202 |
+
die "openenv validate failed"
|
| 203 |
+
}
|
| 204 |
+
else
|
| 205 |
+
"$OPENENV_BIN" validate >/tmp/openenv-validate.out 2>&1 || {
|
| 206 |
+
cat /tmp/openenv-validate.out
|
| 207 |
+
die "openenv validate failed"
|
| 208 |
+
}
|
| 209 |
+
fi
|
| 210 |
+
pass "openenv validate passed"
|
| 211 |
+
|
| 212 |
+
log "Step 4/8: Optional HF Space ping check"
|
| 213 |
+
if [ -n "$PING_URL" ]; then
|
| 214 |
+
PING_URL="${PING_URL%/}"
|
| 215 |
+
code=$(curl -s -o /tmp/pre-submit-ping.out -w "%{http_code}" -X POST \
|
| 216 |
+
-H "Content-Type: application/json" -d '{}' \
|
| 217 |
+
"$PING_URL/reset" --max-time 30 || printf "000")
|
| 218 |
+
[ "$code" = "200" ] || die "HF Space /reset returned HTTP $code"
|
| 219 |
+
pass "HF Space responds to /reset (HTTP 200)"
|
| 220 |
+
else
|
| 221 |
+
log "SKIPPED -- no --ping-url provided"
|
| 222 |
+
fi
|
| 223 |
+
|
| 224 |
+
log "Step 5/8: Docker build + run check"
|
| 225 |
+
if [ "$SKIP_DOCKER" = true ]; then
|
| 226 |
+
log "SKIPPED -- --skip-docker enabled"
|
| 227 |
+
else
|
| 228 |
+
command -v docker >/dev/null 2>&1 || die "docker not found"
|
| 229 |
+
if [ -f "Dockerfile" ]; then
|
| 230 |
+
context="."
|
| 231 |
+
elif [ -f "server/Dockerfile" ]; then
|
| 232 |
+
context="server"
|
| 233 |
+
else
|
| 234 |
+
die "No Dockerfile found at root or server/"
|
| 235 |
+
fi
|
| 236 |
+
run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build "$context" >/tmp/pre-submit-docker.out 2>&1 || {
|
| 237 |
+
tail -n 40 /tmp/pre-submit-docker.out
|
| 238 |
+
die "docker build failed"
|
| 239 |
+
}
|
| 240 |
+
pass "Docker build succeeded"
|
| 241 |
+
|
| 242 |
+
IMAGE_TAG="openenv-pre-submit-local"
|
| 243 |
+
run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build -t "$IMAGE_TAG" "$context" >/tmp/pre-submit-docker-tagged.out 2>&1 || {
|
| 244 |
+
tail -n 40 /tmp/pre-submit-docker-tagged.out
|
| 245 |
+
die "docker build (tagged) failed"
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
DOCKER_CONTAINER_ID="$(docker run -d -p 127.0.0.1::8000 "$IMAGE_TAG" 2>/tmp/pre-submit-docker-run.err || true)"
|
| 249 |
+
[ -n "$DOCKER_CONTAINER_ID" ] || {
|
| 250 |
+
cat /tmp/pre-submit-docker-run.err
|
| 251 |
+
die "docker run failed"
|
| 252 |
+
}
|
| 253 |
+
|
| 254 |
+
HOST_PORT="$(docker port "$DOCKER_CONTAINER_ID" 8000/tcp | tail -n 1 | awk -F: '{print $NF}')"
|
| 255 |
+
[ -n "$HOST_PORT" ] || die "could not resolve mapped host port for container"
|
| 256 |
+
|
| 257 |
+
HEALTH_OK=false
|
| 258 |
+
for _ in $(seq 1 30); do
|
| 259 |
+
health_code=$(curl -s -o /tmp/pre-submit-health.out -w "%{http_code}" \
|
| 260 |
+
"http://127.0.0.1:${HOST_PORT}/health" --max-time 3 || printf "000")
|
| 261 |
+
if [ "$health_code" = "200" ]; then
|
| 262 |
+
HEALTH_OK=true
|
| 263 |
+
break
|
| 264 |
+
fi
|
| 265 |
+
sleep 1
|
| 266 |
+
done
|
| 267 |
+
[ "$HEALTH_OK" = true ] || {
|
| 268 |
+
docker logs "$DOCKER_CONTAINER_ID" | tail -n 50
|
| 269 |
+
die "container did not become healthy on /health"
|
| 270 |
+
}
|
| 271 |
+
|
| 272 |
+
reset_code=$(curl -s -o /tmp/pre-submit-reset.out -w "%{http_code}" -X POST \
|
| 273 |
+
-H "Content-Type: application/json" -d '{}' \
|
| 274 |
+
"http://127.0.0.1:${HOST_PORT}/reset" --max-time 10 || printf "000")
|
| 275 |
+
[ "$reset_code" = "200" ] || {
|
| 276 |
+
docker logs "$DOCKER_CONTAINER_ID" | tail -n 50
|
| 277 |
+
die "container /reset returned HTTP $reset_code"
|
| 278 |
+
}
|
| 279 |
+
|
| 280 |
+
pass "Containerized execution check passed (/health and /reset)"
|
| 281 |
+
|
| 282 |
+
docker rm -f "$DOCKER_CONTAINER_ID" >/dev/null 2>&1 || true
|
| 283 |
+
DOCKER_CONTAINER_ID=""
|
| 284 |
+
fi
|
| 285 |
+
|
| 286 |
+
log "Step 6/8: Environment variable checks"
|
| 287 |
+
if [ "$SKIP_INFERENCE" = true ]; then
|
| 288 |
+
log "SKIPPED -- --skip-inference enabled"
|
| 289 |
+
else
|
| 290 |
+
[ -n "${API_BASE_URL:-}" ] || die "API_BASE_URL is not set"
|
| 291 |
+
[ -n "${MODEL_NAME:-}" ] || die "MODEL_NAME is not set"
|
| 292 |
+
[ -n "${HF_TOKEN:-}" ] || die "HF_TOKEN is not set"
|
| 293 |
+
pass "Required API_BASE_URL / MODEL_NAME / HF_TOKEN are set"
|
| 294 |
+
fi
|
| 295 |
+
|
| 296 |
+
log "Step 7/8: Baseline reproducibility (inference.py)"
|
| 297 |
+
if [ "$SKIP_INFERENCE" = true ]; then
|
| 298 |
+
log "SKIPPED -- --skip-inference enabled"
|
| 299 |
+
else
|
| 300 |
+
run_with_timeout "$INFERENCE_TIMEOUT" "$PYTHON_BIN" inference.py >/tmp/pre-submit-inference.out 2>&1 || {
|
| 301 |
+
tail -n 80 /tmp/pre-submit-inference.out
|
| 302 |
+
die "inference.py failed or timed out"
|
| 303 |
+
}
|
| 304 |
+
pass "inference.py completed within timeout"
|
| 305 |
+
fi
|
| 306 |
+
|
| 307 |
+
log "Step 8/8: Structured logs + task/grader checks"
|
| 308 |
+
if [ "$SKIP_INFERENCE" = true ]; then
|
| 309 |
+
log "SKIPPED -- --skip-inference enabled"
|
| 310 |
+
else
|
| 311 |
+
"$PYTHON_BIN" - <<'PY'
|
| 312 |
+
import json
|
| 313 |
+
import sys
|
| 314 |
+
from pathlib import Path
|
| 315 |
+
|
| 316 |
+
path = Path('/tmp/pre-submit-inference.out')
|
| 317 |
+
text = path.read_text(encoding='utf-8', errors='replace').splitlines()
|
| 318 |
+
|
| 319 |
+
starts = []
|
| 320 |
+
ends = []
|
| 321 |
+
step_count = 0
|
| 322 |
+
|
| 323 |
+
for line in text:
|
| 324 |
+
line = line.strip()
|
| 325 |
+
if line.startswith('[START] '):
|
| 326 |
+
payload = json.loads(line[len('[START] '):])
|
| 327 |
+
starts.append(payload)
|
| 328 |
+
elif line.startswith('[STEP] '):
|
| 329 |
+
json.loads(line[len('[STEP] '):])
|
| 330 |
+
step_count += 1
|
| 331 |
+
elif line.startswith('[END] '):
|
| 332 |
+
payload = json.loads(line[len('[END] '):])
|
| 333 |
+
ends.append(payload)
|
| 334 |
+
|
| 335 |
+
if len(starts) < 3:
|
| 336 |
+
raise SystemExit('Expected at least 3 [START] task logs')
|
| 337 |
+
|
| 338 |
+
unique_tasks = {str(s.get('task', '')) for s in starts if s.get('task')}
|
| 339 |
+
if len(unique_tasks) < 3:
|
| 340 |
+
raise SystemExit('Expected at least 3 unique tasks in [START] logs')
|
| 341 |
+
|
| 342 |
+
if len(ends) != len(starts):
|
| 343 |
+
raise SystemExit('Mismatch between [START] and [END] log counts')
|
| 344 |
+
|
| 345 |
+
if step_count == 0:
|
| 346 |
+
raise SystemExit('No [STEP] logs found')
|
| 347 |
+
|
| 348 |
+
for i, end in enumerate(ends, start=1):
|
| 349 |
+
score = float(end.get('score', -1.0))
|
| 350 |
+
rewards = end.get('rewards', [])
|
| 351 |
+
if not (0.0 <= score <= 1.0):
|
| 352 |
+
raise SystemExit(f'END #{i} score out of range [0,1]: {score}')
|
| 353 |
+
if not isinstance(rewards, list):
|
| 354 |
+
raise SystemExit(f'END #{i} rewards must be a list')
|
| 355 |
+
for r in rewards:
|
| 356 |
+
rv = float(r)
|
| 357 |
+
if not (-1.0 <= rv <= 1.0):
|
| 358 |
+
raise SystemExit(f'END #{i} step reward out of sanity range [-1,1]: {rv}')
|
| 359 |
+
|
| 360 |
+
print('Structured logs and task/grader checks passed')
|
| 361 |
+
PY
|
| 362 |
+
pass "Structured [START]/[STEP]/[END] logs and score-range checks passed"
|
| 363 |
+
fi
|
| 364 |
+
|
| 365 |
+
log "All checks passed. Submission is ready."
|
scripts/validate-submission.sh
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
#
|
| 3 |
+
# validate-submission.sh — OpenEnv Submission Validator
|
| 4 |
+
#
|
| 5 |
+
# Checks that your HF Space is live, Docker image builds, and openenv validate passes.
|
| 6 |
+
#
|
| 7 |
+
# Prerequisites:
|
| 8 |
+
# - Docker: https://docs.docker.com/get-docker/
|
| 9 |
+
# - openenv-core: pip install openenv-core
|
| 10 |
+
# - curl (usually pre-installed)
|
| 11 |
+
#
|
| 12 |
+
# Run:
|
| 13 |
+
# curl -fsSL https://raw.githubusercontent.com/<owner>/<repo>/main/scripts/validate-submission.sh | bash -s -- <ping_url> [repo_dir]
|
| 14 |
+
#
|
| 15 |
+
# Or download and run locally:
|
| 16 |
+
# chmod +x validate-submission.sh
|
| 17 |
+
# ./validate-submission.sh <ping_url> [repo_dir]
|
| 18 |
+
#
|
| 19 |
+
# Arguments:
|
| 20 |
+
# ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)
|
| 21 |
+
# repo_dir Path to your repo (default: current directory)
|
| 22 |
+
#
|
| 23 |
+
# Examples:
|
| 24 |
+
# ./validate-submission.sh https://my-team.hf.space
|
| 25 |
+
# ./validate-submission.sh https://my-team.hf.space ./my-repo
|
| 26 |
+
#
|
| 27 |
+
|
| 28 |
+
set -uo pipefail
|
| 29 |
+
|
| 30 |
+
DOCKER_BUILD_TIMEOUT=600
|
| 31 |
+
if [ -t 1 ]; then
|
| 32 |
+
RED='\033[0;31m'
|
| 33 |
+
GREEN='\033[0;32m'
|
| 34 |
+
YELLOW='\033[1;33m'
|
| 35 |
+
BOLD='\033[1m'
|
| 36 |
+
NC='\033[0m'
|
| 37 |
+
else
|
| 38 |
+
RED='' GREEN='' YELLOW='' BOLD='' NC=''
|
| 39 |
+
fi
|
| 40 |
+
|
| 41 |
+
run_with_timeout() {
|
| 42 |
+
local secs="$1"; shift
|
| 43 |
+
if command -v timeout &>/dev/null; then
|
| 44 |
+
timeout "$secs" "$@"
|
| 45 |
+
elif command -v gtimeout &>/dev/null; then
|
| 46 |
+
gtimeout "$secs" "$@"
|
| 47 |
+
else
|
| 48 |
+
"$@" &
|
| 49 |
+
local pid=$!
|
| 50 |
+
( sleep "$secs" && kill "$pid" 2>/dev/null ) &
|
| 51 |
+
local watcher=$!
|
| 52 |
+
wait "$pid" 2>/dev/null
|
| 53 |
+
local rc=$?
|
| 54 |
+
kill "$watcher" 2>/dev/null
|
| 55 |
+
wait "$watcher" 2>/dev/null
|
| 56 |
+
return $rc
|
| 57 |
+
fi
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
portable_mktemp() {
|
| 61 |
+
local prefix="${1:-validate}"
|
| 62 |
+
mktemp "${TMPDIR:-/tmp}/${prefix}-XXXXXX" 2>/dev/null || mktemp
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
CLEANUP_FILES=()
|
| 66 |
+
cleanup() { rm -f "${CLEANUP_FILES[@]+"${CLEANUP_FILES[@]}"}"; }
|
| 67 |
+
trap cleanup EXIT
|
| 68 |
+
|
| 69 |
+
PING_URL="${1:-}"
|
| 70 |
+
REPO_DIR="${2:-.}"
|
| 71 |
+
|
| 72 |
+
if [ -z "$PING_URL" ]; then
|
| 73 |
+
printf "Usage: %s <ping_url> [repo_dir]\n" "$0"
|
| 74 |
+
printf "\n"
|
| 75 |
+
printf " ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)\n"
|
| 76 |
+
printf " repo_dir Path to your repo (default: current directory)\n"
|
| 77 |
+
exit 1
|
| 78 |
+
fi
|
| 79 |
+
|
| 80 |
+
if ! REPO_DIR="$(cd "$REPO_DIR" 2>/dev/null && pwd)"; then
|
| 81 |
+
printf "Error: directory '%s' not found\n" "${2:-.}"
|
| 82 |
+
exit 1
|
| 83 |
+
fi
|
| 84 |
+
PING_URL="${PING_URL%/}"
|
| 85 |
+
export PING_URL
|
| 86 |
+
PASS=0
|
| 87 |
+
|
| 88 |
+
log() { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
|
| 89 |
+
pass() { log "${GREEN}PASSED${NC} -- $1"; PASS=$((PASS + 1)); }
|
| 90 |
+
fail() { log "${RED}FAILED${NC} -- $1"; }
|
| 91 |
+
hint() { printf " ${YELLOW}Hint:${NC} %b\n" "$1"; }
|
| 92 |
+
stop_at() {
|
| 93 |
+
printf "\n"
|
| 94 |
+
printf "${RED}${BOLD}Validation stopped at %s.${NC} Fix the above before continuing.\n" "$1"
|
| 95 |
+
exit 1
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
printf "\n"
|
| 99 |
+
printf "${BOLD}========================================${NC}\n"
|
| 100 |
+
printf "${BOLD} OpenEnv Submission Validator${NC}\n"
|
| 101 |
+
printf "${BOLD}========================================${NC}\n"
|
| 102 |
+
log "Repo: $REPO_DIR"
|
| 103 |
+
log "Ping URL: $PING_URL"
|
| 104 |
+
printf "\n"
|
| 105 |
+
|
| 106 |
+
log "${BOLD}Step 1/3: Pinging HF Space${NC} ($PING_URL/reset) ..."
|
| 107 |
+
|
| 108 |
+
CURL_OUTPUT=$(portable_mktemp "validate-curl")
|
| 109 |
+
CLEANUP_FILES+=("$CURL_OUTPUT")
|
| 110 |
+
HTTP_CODE=$(curl -s -o "$CURL_OUTPUT" -w "%{http_code}" -X POST \
|
| 111 |
+
-H "Content-Type: application/json" -d '{}' \
|
| 112 |
+
"$PING_URL/reset" --max-time 30 2>"$CURL_OUTPUT" || printf "000")
|
| 113 |
+
|
| 114 |
+
if [ "$HTTP_CODE" = "200" ]; then
|
| 115 |
+
pass "HF Space is live and responds to /reset"
|
| 116 |
+
elif [ "$HTTP_CODE" = "000" ]; then
|
| 117 |
+
fail "HF Space not reachable (connection failed or timed out)"
|
| 118 |
+
hint "Check your network connection and that the Space is running."
|
| 119 |
+
hint "Try: curl -s -o /dev/null -w '%%{http_code}' -X POST $PING_URL/reset"
|
| 120 |
+
stop_at "Step 1"
|
| 121 |
+
else
|
| 122 |
+
fail "HF Space /reset returned HTTP $HTTP_CODE (expected 200)"
|
| 123 |
+
hint "Make sure your Space is running and the URL is correct."
|
| 124 |
+
hint "Try opening $PING_URL in your browser first."
|
| 125 |
+
stop_at "Step 1"
|
| 126 |
+
fi
|
| 127 |
+
|
| 128 |
+
log "${BOLD}Step 2/3: Running docker build${NC} ..."
|
| 129 |
+
|
| 130 |
+
if ! command -v docker &>/dev/null; then
|
| 131 |
+
fail "docker command not found"
|
| 132 |
+
hint "Install Docker: https://docs.docker.com/get-docker/"
|
| 133 |
+
stop_at "Step 2"
|
| 134 |
+
fi
|
| 135 |
+
|
| 136 |
+
if [ -f "$REPO_DIR/Dockerfile" ]; then
|
| 137 |
+
DOCKER_CONTEXT="$REPO_DIR"
|
| 138 |
+
elif [ -f "$REPO_DIR/server/Dockerfile" ]; then
|
| 139 |
+
DOCKER_CONTEXT="$REPO_DIR/server"
|
| 140 |
+
else
|
| 141 |
+
fail "No Dockerfile found in repo root or server/ directory"
|
| 142 |
+
stop_at "Step 2"
|
| 143 |
+
fi
|
| 144 |
+
|
| 145 |
+
log " Found Dockerfile in $DOCKER_CONTEXT"
|
| 146 |
+
|
| 147 |
+
BUILD_OK=false
|
| 148 |
+
BUILD_OUTPUT=$(run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build "$DOCKER_CONTEXT" 2>&1) && BUILD_OK=true
|
| 149 |
+
|
| 150 |
+
if [ "$BUILD_OK" = true ]; then
|
| 151 |
+
pass "Docker build succeeded"
|
| 152 |
+
else
|
| 153 |
+
fail "Docker build failed (timeout=${DOCKER_BUILD_TIMEOUT}s)"
|
| 154 |
+
printf "%s\n" "$BUILD_OUTPUT" | tail -20
|
| 155 |
+
stop_at "Step 2"
|
| 156 |
+
fi
|
| 157 |
+
|
| 158 |
+
log "${BOLD}Step 3/3: Running openenv validate${NC} ..."
|
| 159 |
+
|
| 160 |
+
if ! command -v openenv &>/dev/null; then
|
| 161 |
+
fail "openenv command not found"
|
| 162 |
+
hint "Install it: pip install openenv-core"
|
| 163 |
+
stop_at "Step 3"
|
| 164 |
+
fi
|
| 165 |
+
|
| 166 |
+
VALIDATE_OK=false
|
| 167 |
+
VALIDATE_OUTPUT=$(cd "$REPO_DIR" && openenv validate 2>&1) && VALIDATE_OK=true
|
| 168 |
+
|
| 169 |
+
if [ "$VALIDATE_OK" = true ]; then
|
| 170 |
+
pass "openenv validate passed"
|
| 171 |
+
[ -n "$VALIDATE_OUTPUT" ] && log " $VALIDATE_OUTPUT"
|
| 172 |
+
else
|
| 173 |
+
fail "openenv validate failed"
|
| 174 |
+
printf "%s\n" "$VALIDATE_OUTPUT"
|
| 175 |
+
stop_at "Step 3"
|
| 176 |
+
fi
|
| 177 |
+
|
| 178 |
+
printf "\n"
|
| 179 |
+
printf "${BOLD}========================================${NC}\n"
|
| 180 |
+
printf "${GREEN}${BOLD} All 3/3 checks passed!${NC}\n"
|
| 181 |
+
printf "${GREEN}${BOLD} Your submission is ready to submit.${NC}\n"
|
| 182 |
+
printf "${BOLD}========================================${NC}\n"
|
| 183 |
+
printf "\n"
|
| 184 |
+
|
| 185 |
+
exit 0
|
server/Dockerfile
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
# Multi-stage build using openenv-base
|
| 8 |
+
# This Dockerfile is flexible and works for both:
|
| 9 |
+
# - In-repo environments (with local OpenEnv sources)
|
| 10 |
+
# - Standalone environments (with openenv from PyPI/Git)
|
| 11 |
+
# The build script (openenv build) handles context detection and sets appropriate build args.
|
| 12 |
+
|
| 13 |
+
ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
|
| 14 |
+
FROM ${BASE_IMAGE} AS builder
|
| 15 |
+
|
| 16 |
+
WORKDIR /app
|
| 17 |
+
|
| 18 |
+
# Ensure git is available (required for installing dependencies from VCS)
|
| 19 |
+
RUN apt-get update && \
|
| 20 |
+
apt-get install -y --no-install-recommends git && \
|
| 21 |
+
rm -rf /var/lib/apt/lists/*
|
| 22 |
+
|
| 23 |
+
# Build argument to control whether we're building standalone or in-repo
|
| 24 |
+
ARG BUILD_MODE=in-repo
|
| 25 |
+
ARG ENV_NAME=cloud_devops_env
|
| 26 |
+
|
| 27 |
+
# Copy environment code (always at root of build context)
|
| 28 |
+
COPY . /app/env
|
| 29 |
+
|
| 30 |
+
# For in-repo builds, openenv is already vendored in the build context
|
| 31 |
+
# For standalone builds, openenv will be installed via pyproject.toml
|
| 32 |
+
WORKDIR /app/env
|
| 33 |
+
|
| 34 |
+
# Ensure uv is available (for local builds where base image lacks it)
|
| 35 |
+
RUN if ! command -v uv >/dev/null 2>&1; then \
|
| 36 |
+
curl -LsSf https://astral.sh/uv/install.sh | sh && \
|
| 37 |
+
mv /root/.local/bin/uv /usr/local/bin/uv && \
|
| 38 |
+
mv /root/.local/bin/uvx /usr/local/bin/uvx; \
|
| 39 |
+
fi
|
| 40 |
+
|
| 41 |
+
# Install dependencies using uv sync
|
| 42 |
+
# If uv.lock exists, use it; otherwise resolve on the fly
|
| 43 |
+
RUN --mount=type=cache,target=/root/.cache/uv \
|
| 44 |
+
if [ -f uv.lock ]; then \
|
| 45 |
+
uv sync --frozen --no-install-project --no-editable; \
|
| 46 |
+
else \
|
| 47 |
+
uv sync --no-install-project --no-editable; \
|
| 48 |
+
fi
|
| 49 |
+
|
| 50 |
+
RUN --mount=type=cache,target=/root/.cache/uv \
|
| 51 |
+
if [ -f uv.lock ]; then \
|
| 52 |
+
uv sync --frozen --no-editable; \
|
| 53 |
+
else \
|
| 54 |
+
uv sync --no-editable; \
|
| 55 |
+
fi
|
| 56 |
+
|
| 57 |
+
# Final runtime stage
|
| 58 |
+
FROM ${BASE_IMAGE}
|
| 59 |
+
|
| 60 |
+
WORKDIR /app
|
| 61 |
+
|
| 62 |
+
# Copy the virtual environment from builder
|
| 63 |
+
COPY --from=builder /app/env/.venv /app/.venv
|
| 64 |
+
|
| 65 |
+
# Copy the environment code
|
| 66 |
+
COPY --from=builder /app/env /app/env
|
| 67 |
+
|
| 68 |
+
# Set PATH to use the virtual environment
|
| 69 |
+
ENV PATH="/app/.venv/bin:$PATH"
|
| 70 |
+
|
| 71 |
+
# Set PYTHONPATH so imports work correctly
|
| 72 |
+
ENV PYTHONPATH="/app/env:$PYTHONPATH"
|
| 73 |
+
|
| 74 |
+
# Health check
|
| 75 |
+
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
| 76 |
+
CMD curl -f http://localhost:8000/health || exit 1
|
| 77 |
+
|
| 78 |
+
# Run the FastAPI server
|
| 79 |
+
# The module path is constructed to work with the /app/env structure
|
| 80 |
+
CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
|
server/__init__.py
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
"""Cloud Devops Env environment server components."""
|
| 8 |
+
|
| 9 |
+
from .cloud_devops_env_environment import CloudDevopsEnvironment
|
| 10 |
+
|
| 11 |
+
__all__ = ["CloudDevopsEnvironment"]
|
server/app.py
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
"""
|
| 8 |
+
FastAPI application for the Cloud Devops Env Environment.
|
| 9 |
+
|
| 10 |
+
This module creates an HTTP server that exposes the CloudDevopsEnvironment
|
| 11 |
+
over HTTP and WebSocket endpoints, compatible with EnvClient.
|
| 12 |
+
|
| 13 |
+
Endpoints:
|
| 14 |
+
- POST /reset: Reset the environment
|
| 15 |
+
- POST /step: Execute an action
|
| 16 |
+
- GET /state: Get current environment state
|
| 17 |
+
- GET /schema: Get action/observation schemas
|
| 18 |
+
- WS /ws: WebSocket endpoint for persistent sessions
|
| 19 |
+
|
| 20 |
+
Usage:
|
| 21 |
+
# Development (with auto-reload):
|
| 22 |
+
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
|
| 23 |
+
|
| 24 |
+
# Production:
|
| 25 |
+
uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
|
| 26 |
+
|
| 27 |
+
# Or run directly:
|
| 28 |
+
python -m server.app
|
| 29 |
+
"""
|
| 30 |
+
|
| 31 |
+
import os
|
| 32 |
+
from pathlib import Path
|
| 33 |
+
|
| 34 |
+
# Default to enabling the OpenEnv web interface for local development.
|
| 35 |
+
# You can still disable it explicitly: ENABLE_WEB_INTERFACE=false
|
| 36 |
+
os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
|
| 37 |
+
os.environ.setdefault(
|
| 38 |
+
"ENV_README_PATH",
|
| 39 |
+
str((Path(__file__).resolve().parent.parent / "README.md")),
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
try:
|
| 43 |
+
from openenv.core.env_server.http_server import create_app
|
| 44 |
+
except Exception as e: # pragma: no cover
|
| 45 |
+
raise ImportError(
|
| 46 |
+
"openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
|
| 47 |
+
) from e
|
| 48 |
+
|
| 49 |
+
try:
|
| 50 |
+
from ..models import CloudDevopsAction, CloudDevopsObservation
|
| 51 |
+
from .cloud_devops_env_environment import CloudDevopsEnvironment
|
| 52 |
+
except (ModuleNotFoundError, ImportError):
|
| 53 |
+
from models import CloudDevopsAction, CloudDevopsObservation
|
| 54 |
+
from server.cloud_devops_env_environment import CloudDevopsEnvironment
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
# Create the app with web interface and README integration
|
| 58 |
+
app = create_app(
|
| 59 |
+
CloudDevopsEnvironment,
|
| 60 |
+
CloudDevopsAction,
|
| 61 |
+
CloudDevopsObservation,
|
| 62 |
+
env_name="cloud_devops_env",
|
| 63 |
+
max_concurrent_envs=1, # increase this number to allow more concurrent WebSocket sessions
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def main(host: str | None = None, port: int | None = None):
|
| 68 |
+
"""
|
| 69 |
+
Entry point for direct execution via uv run or python -m.
|
| 70 |
+
|
| 71 |
+
This function enables running the server without Docker:
|
| 72 |
+
uv run --project . server
|
| 73 |
+
uv run --project . server --port 8001
|
| 74 |
+
python -m cloud_devops_env.server.app
|
| 75 |
+
|
| 76 |
+
Args:
|
| 77 |
+
host: Host address to bind to. If not provided, CLI args are parsed.
|
| 78 |
+
port: Port number to listen on. If not provided, CLI args are parsed.
|
| 79 |
+
|
| 80 |
+
For production deployments, consider using uvicorn directly with
|
| 81 |
+
multiple workers:
|
| 82 |
+
uvicorn cloud_devops_env.server.app:app --workers 4
|
| 83 |
+
"""
|
| 84 |
+
import argparse
|
| 85 |
+
import uvicorn
|
| 86 |
+
|
| 87 |
+
# Console-script entry points invoke main() with no parameters, so parse
|
| 88 |
+
# CLI flags here to make `server --host ... --port ...` work as expected.
|
| 89 |
+
if host is None and port is None:
|
| 90 |
+
parser = argparse.ArgumentParser(add_help=False)
|
| 91 |
+
parser.add_argument("--host", type=str, default="0.0.0.0")
|
| 92 |
+
parser.add_argument("--port", type=int, default=8000)
|
| 93 |
+
args, _ = parser.parse_known_args()
|
| 94 |
+
host = args.host
|
| 95 |
+
port = args.port
|
| 96 |
+
|
| 97 |
+
uvicorn.run(app, host=host or "0.0.0.0", port=port or 8000)
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
if __name__ == "__main__":
|
| 101 |
+
main()
|
server/cloud_devops_env_environment.py
ADDED
|
@@ -0,0 +1,384 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
| 2 |
+
# All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This source code is licensed under the BSD-style license found in the
|
| 5 |
+
# LICENSE file in the root directory of this source tree.
|
| 6 |
+
|
| 7 |
+
"""
|
| 8 |
+
Cloud Devops Env Environment Implementation.
|
| 9 |
+
|
| 10 |
+
A deterministic mock cloud/devops environment with reward shaping and
|
| 11 |
+
anti-farming guardrails for hackathon evaluation.
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
from __future__ import annotations
|
| 15 |
+
|
| 16 |
+
import copy
|
| 17 |
+
from uuid import uuid4
|
| 18 |
+
|
| 19 |
+
from openenv.core.env_server.interfaces import Environment
|
| 20 |
+
from openenv.core.env_server.types import State
|
| 21 |
+
|
| 22 |
+
try:
|
| 23 |
+
from ..models import CloudAction, CloudObservation, CloudState
|
| 24 |
+
except ImportError:
|
| 25 |
+
from models import CloudAction, CloudObservation, CloudState
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
class CloudDevopsEnvironment(Environment):
|
| 29 |
+
"""
|
| 30 |
+
A deterministic mock cloud/devops environment.
|
| 31 |
+
|
| 32 |
+
Tasks:
|
| 33 |
+
- easy: open port 80 on sg-web
|
| 34 |
+
- medium: inspect noisy API logs, then open port 5432 on sg-db
|
| 35 |
+
- hard: trace 502 from lb-main to i-web2, then restart i-web2 (not i-web1)
|
| 36 |
+
|
| 37 |
+
Example:
|
| 38 |
+
>>> env = CloudDevopsEnvironment()
|
| 39 |
+
>>> obs = env.reset()
|
| 40 |
+
>>> print(obs.system_health_status) # "CRITICAL"
|
| 41 |
+
>>>
|
| 42 |
+
>>> obs = env.step(CloudAction(command="list_resources"))
|
| 43 |
+
>>> print(obs.output)
|
| 44 |
+
"""
|
| 45 |
+
|
| 46 |
+
# Enable concurrent WebSocket sessions.
|
| 47 |
+
# Set to True if your environment isolates state between instances.
|
| 48 |
+
# When True, multiple WebSocket clients can connect simultaneously, each
|
| 49 |
+
# getting their own environment instance (when using factory mode in app.py).
|
| 50 |
+
SUPPORTS_CONCURRENT_SESSIONS: bool = True
|
| 51 |
+
MAX_STEPS: int = 20
|
| 52 |
+
VALID_TASKS = {"easy", "medium", "hard"}
|
| 53 |
+
|
| 54 |
+
def __init__(self, task_name: str = "easy"):
|
| 55 |
+
"""Initialize the cloud_devops_env environment."""
|
| 56 |
+
normalized_task = (task_name or "easy").lower()
|
| 57 |
+
if normalized_task not in self.VALID_TASKS:
|
| 58 |
+
raise ValueError(f"Unknown task: {task_name}")
|
| 59 |
+
|
| 60 |
+
self.task_name = normalized_task
|
| 61 |
+
self._state_data: CloudState | None = None
|
| 62 |
+
self._achievements: set[str] = set()
|
| 63 |
+
|
| 64 |
+
def _build_noise_resources(self) -> dict[str, dict[str, object]]:
|
| 65 |
+
"""Generate deterministic decoy resources to force retrieval and filtering."""
|
| 66 |
+
resources: dict[str, dict[str, object]] = {}
|
| 67 |
+
for i in range(1, 21):
|
| 68 |
+
suffix = f"{i:02d}"
|
| 69 |
+
resources[f"i-backend-{suffix}"] = {
|
| 70 |
+
"type": "Instance",
|
| 71 |
+
"status": "running",
|
| 72 |
+
"logs": (
|
| 73 |
+
"[2026-04-06 17:00:00] INFO node-exporter: "
|
| 74 |
+
"standard metrics reported successfully"
|
| 75 |
+
),
|
| 76 |
+
}
|
| 77 |
+
resources[f"sg-backend-{suffix}"] = {
|
| 78 |
+
"type": "SecurityGroup",
|
| 79 |
+
"rules": [{"port": 443, "action": "allow"}],
|
| 80 |
+
}
|
| 81 |
+
return resources
|
| 82 |
+
|
| 83 |
+
def _build_task_resources(self) -> dict[str, dict[str, object]]:
|
| 84 |
+
resources = self._build_noise_resources()
|
| 85 |
+
|
| 86 |
+
if self.task_name == "easy":
|
| 87 |
+
resources.update(
|
| 88 |
+
{
|
| 89 |
+
"i-web": {"type": "Instance", "status": "running"},
|
| 90 |
+
"sg-web": {
|
| 91 |
+
"type": "SecurityGroup",
|
| 92 |
+
"rules": [{"port": 22, "action": "allow"}],
|
| 93 |
+
},
|
| 94 |
+
}
|
| 95 |
+
)
|
| 96 |
+
return resources
|
| 97 |
+
|
| 98 |
+
if self.task_name == "medium":
|
| 99 |
+
resources.update(
|
| 100 |
+
{
|
| 101 |
+
"i-api": {
|
| 102 |
+
"type": "Instance",
|
| 103 |
+
"status": "running",
|
| 104 |
+
"logs": (
|
| 105 |
+
"[2026-04-06 17:01:22] [CRITICAL] "
|
| 106 |
+
"sqlalchemy.exc.OperationalError: "
|
| 107 |
+
"(psycopg2.OperationalError) connection to server at "
|
| 108 |
+
"'10.0.4.5' (i-db), port 5432 failed: Connection timed out. "
|
| 109 |
+
"Is the server running and accepting TCP/IP connections?"
|
| 110 |
+
),
|
| 111 |
+
},
|
| 112 |
+
"i-db": {"type": "Instance", "status": "running"},
|
| 113 |
+
"sg-db": {
|
| 114 |
+
"type": "SecurityGroup",
|
| 115 |
+
"rules": [{"port": 22, "action": "allow"}],
|
| 116 |
+
},
|
| 117 |
+
}
|
| 118 |
+
)
|
| 119 |
+
return resources
|
| 120 |
+
|
| 121 |
+
resources.update(
|
| 122 |
+
{
|
| 123 |
+
"lb-main": {
|
| 124 |
+
"type": "LoadBalancer",
|
| 125 |
+
"logs": (
|
| 126 |
+
"2026/04/06 17:02:09 [error] 3197#3197: *4189 upstream timed out "
|
| 127 |
+
"(110: Connection timed out) while reading response header from upstream, "
|
| 128 |
+
"client: 10.0.2.14, server: api.prod.local, request: \"GET /checkout HTTP/1.1\", "
|
| 129 |
+
"upstream: \"http://i-web2:8080/checkout\", host: \"api.prod.local\"\n"
|
| 130 |
+
"2026/04/06 17:02:10 [error] 3197#3197: *4190 no live upstreams while "
|
| 131 |
+
"connecting to upstream \"i-web2\""
|
| 132 |
+
),
|
| 133 |
+
},
|
| 134 |
+
"i-web1": {
|
| 135 |
+
"type": "Instance",
|
| 136 |
+
"status": "running",
|
| 137 |
+
"logs": (
|
| 138 |
+
"[2026-04-06 17:02:11] INFO web-service: readiness probe passed\n"
|
| 139 |
+
"[2026-04-06 17:02:12] INFO jvm: heap usage stable at 42%"
|
| 140 |
+
),
|
| 141 |
+
},
|
| 142 |
+
"i-web2": {
|
| 143 |
+
"type": "Instance",
|
| 144 |
+
"status": "degraded",
|
| 145 |
+
"logs": (
|
| 146 |
+
"kernel: Out of memory: Killed process 12345 (java) total-vm:4194304kB, "
|
| 147 |
+
"anon-rss:3145728kB\n"
|
| 148 |
+
"systemd[1]: web-service.service: Main process exited, code=killed, "
|
| 149 |
+
"status=9/KILL"
|
| 150 |
+
),
|
| 151 |
+
},
|
| 152 |
+
"sg-web": {
|
| 153 |
+
"type": "SecurityGroup",
|
| 154 |
+
"rules": [{"port": 80, "action": "allow"}],
|
| 155 |
+
},
|
| 156 |
+
}
|
| 157 |
+
)
|
| 158 |
+
return resources
|
| 159 |
+
|
| 160 |
+
def _reward_once(self, achievement: str, points: float) -> float:
|
| 161 |
+
if achievement in self._achievements:
|
| 162 |
+
return 0.0
|
| 163 |
+
self._achievements.add(achievement)
|
| 164 |
+
return points
|
| 165 |
+
|
| 166 |
+
def reset(self) -> CloudObservation: # type: ignore[override]
|
| 167 |
+
"""Reset the environment to the initial state for the selected task."""
|
| 168 |
+
self._achievements.clear()
|
| 169 |
+
self._state_data = CloudState(
|
| 170 |
+
episode_id=str(uuid4()),
|
| 171 |
+
task_difficulty=self.task_name,
|
| 172 |
+
resources=copy.deepcopy(self._build_task_resources()),
|
| 173 |
+
step_count=0,
|
| 174 |
+
is_resolved=False,
|
| 175 |
+
)
|
| 176 |
+
|
| 177 |
+
return CloudObservation(
|
| 178 |
+
output=(
|
| 179 |
+
"Environment initialized. System status is currently CRITICAL. "
|
| 180 |
+
"Use 'list_resources' to begin triage."
|
| 181 |
+
),
|
| 182 |
+
error=None,
|
| 183 |
+
system_health_status="CRITICAL",
|
| 184 |
+
done=False,
|
| 185 |
+
reward=0.0,
|
| 186 |
+
metadata={
|
| 187 |
+
"step_count": 0,
|
| 188 |
+
"resolved": False,
|
| 189 |
+
"task": self.task_name,
|
| 190 |
+
"total_resources": len(self._state_data.resources),
|
| 191 |
+
},
|
| 192 |
+
echoed_message="Cloud Devops Env environment ready!",
|
| 193 |
+
message_length=0,
|
| 194 |
+
)
|
| 195 |
+
|
| 196 |
+
def step(self, action: CloudAction) -> CloudObservation: # type: ignore[override]
|
| 197 |
+
"""Execute the agent action and return the next observation."""
|
| 198 |
+
if self._state_data is None:
|
| 199 |
+
self.reset()
|
| 200 |
+
|
| 201 |
+
assert self._state_data is not None
|
| 202 |
+
state = self._state_data
|
| 203 |
+
|
| 204 |
+
state.step_count += 1
|
| 205 |
+
reward = 0.0
|
| 206 |
+
done = False
|
| 207 |
+
output = ""
|
| 208 |
+
error = None
|
| 209 |
+
|
| 210 |
+
try:
|
| 211 |
+
if action.command == "list_resources":
|
| 212 |
+
res_list = [
|
| 213 |
+
f"{resource_id} ({data['type']})"
|
| 214 |
+
for resource_id, data in sorted(state.resources.items())
|
| 215 |
+
]
|
| 216 |
+
output = "Available Resources:\n" + "\n".join(res_list)
|
| 217 |
+
|
| 218 |
+
elif action.command == "describe_resource":
|
| 219 |
+
if not action.resource_id or action.resource_id not in state.resources:
|
| 220 |
+
raise ValueError(f"Resource {action.resource_id} not found.")
|
| 221 |
+
|
| 222 |
+
output = str(state.resources[action.resource_id])
|
| 223 |
+
|
| 224 |
+
if self.task_name == "easy" and action.resource_id == "sg-web":
|
| 225 |
+
reward += self._reward_once("read_sg", 0.2)
|
| 226 |
+
elif self.task_name == "medium" and action.resource_id == "sg-db":
|
| 227 |
+
reward += self._reward_once("read_sg", 0.2)
|
| 228 |
+
elif self.task_name == "hard" and action.resource_id == "i-web2":
|
| 229 |
+
reward += self._reward_once("inspect_target", 0.2)
|
| 230 |
+
|
| 231 |
+
elif action.command == "view_logs":
|
| 232 |
+
if not action.resource_id:
|
| 233 |
+
raise ValueError("resource_id is required for view_logs.")
|
| 234 |
+
|
| 235 |
+
res = state.resources.get(action.resource_id)
|
| 236 |
+
if not res:
|
| 237 |
+
raise ValueError(f"Resource {action.resource_id} not found.")
|
| 238 |
+
|
| 239 |
+
output = str(res.get("logs", "No logs available for this resource."))
|
| 240 |
+
|
| 241 |
+
if self.task_name == "medium" and action.resource_id == "i-api":
|
| 242 |
+
reward += self._reward_once("read_logs", 0.2)
|
| 243 |
+
elif self.task_name == "hard" and action.resource_id == "lb-main":
|
| 244 |
+
reward += self._reward_once("inspect_lb", 0.2)
|
| 245 |
+
elif self.task_name == "hard" and action.resource_id == "i-web2":
|
| 246 |
+
reward += self._reward_once("inspect_target", 0.2)
|
| 247 |
+
|
| 248 |
+
elif action.command == "update_security_group":
|
| 249 |
+
if not action.resource_id:
|
| 250 |
+
raise ValueError("resource_id is required for update_security_group.")
|
| 251 |
+
|
| 252 |
+
res = state.resources.get(action.resource_id)
|
| 253 |
+
if not res or res.get("type") != "SecurityGroup":
|
| 254 |
+
raise ValueError(f"Invalid Security Group ID: {action.resource_id}")
|
| 255 |
+
if not action.parameters or "port" not in action.parameters:
|
| 256 |
+
raise ValueError("Missing 'port' in parameters.")
|
| 257 |
+
|
| 258 |
+
rule = copy.deepcopy(action.parameters)
|
| 259 |
+
rules = res.get("rules")
|
| 260 |
+
if not isinstance(rules, list):
|
| 261 |
+
raise ValueError(f"Security group {action.resource_id} has invalid rules.")
|
| 262 |
+
rules.append(rule)
|
| 263 |
+
output = f"Successfully updated {action.resource_id} with rule: {rule}"
|
| 264 |
+
|
| 265 |
+
port = int(rule["port"])
|
| 266 |
+
if (
|
| 267 |
+
self.task_name == "easy"
|
| 268 |
+
and action.resource_id == "sg-web"
|
| 269 |
+
and port == 80
|
| 270 |
+
):
|
| 271 |
+
state.is_resolved = True
|
| 272 |
+
reward += 0.8
|
| 273 |
+
done = True
|
| 274 |
+
output += "\nSUCCESS: Web server is now accessible!"
|
| 275 |
+
elif (
|
| 276 |
+
self.task_name == "medium"
|
| 277 |
+
and action.resource_id == "sg-db"
|
| 278 |
+
and port == 5432
|
| 279 |
+
):
|
| 280 |
+
if "read_logs" in self._achievements:
|
| 281 |
+
state.is_resolved = True
|
| 282 |
+
reward += 0.6
|
| 283 |
+
done = True
|
| 284 |
+
output += "\nSUCCESS: Database connection restored!"
|
| 285 |
+
else:
|
| 286 |
+
reward -= 0.1
|
| 287 |
+
output += (
|
| 288 |
+
"\nWARNING: Change applied without incident triage. "
|
| 289 |
+
"Inspect API logs before closing the incident."
|
| 290 |
+
)
|
| 291 |
+
|
| 292 |
+
elif action.command == "restart_service":
|
| 293 |
+
if not action.resource_id:
|
| 294 |
+
raise ValueError("resource_id is required for restart_service.")
|
| 295 |
+
if action.resource_id not in state.resources:
|
| 296 |
+
raise ValueError(f"Resource {action.resource_id} not found.")
|
| 297 |
+
|
| 298 |
+
output = f"Service on {action.resource_id} restarted."
|
| 299 |
+
|
| 300 |
+
if self.task_name == "hard":
|
| 301 |
+
if action.resource_id == "i-web2":
|
| 302 |
+
investigated_root_cause = (
|
| 303 |
+
"inspect_lb" in self._achievements
|
| 304 |
+
and "inspect_target" in self._achievements
|
| 305 |
+
)
|
| 306 |
+
if investigated_root_cause:
|
| 307 |
+
state.resources["i-web2"]["status"] = "running"
|
| 308 |
+
state.resources["i-web2"][
|
| 309 |
+
"logs"
|
| 310 |
+
] = "INFO: Restart successful. Memory cleared."
|
| 311 |
+
state.is_resolved = True
|
| 312 |
+
reward += 0.8
|
| 313 |
+
done = True
|
| 314 |
+
output += "\nSUCCESS: OutOfMemory loop broken. System stable."
|
| 315 |
+
else:
|
| 316 |
+
reward -= 0.1
|
| 317 |
+
output += (
|
| 318 |
+
"\nWARNING: Restart denied by change policy. "
|
| 319 |
+
"Find failing upstream from lb-main and inspect i-web2 first."
|
| 320 |
+
)
|
| 321 |
+
elif action.resource_id == "i-web1":
|
| 322 |
+
reward -= 0.2
|
| 323 |
+
output += (
|
| 324 |
+
"\nWARNING: You restarted a healthy production server! "
|
| 325 |
+
"Users dropped."
|
| 326 |
+
)
|
| 327 |
+
|
| 328 |
+
elif action.command == "submit_solution":
|
| 329 |
+
if state.is_resolved:
|
| 330 |
+
done = True
|
| 331 |
+
output = "Solution verified. System is HEALTHY."
|
| 332 |
+
else:
|
| 333 |
+
if self.task_name == "hard":
|
| 334 |
+
# In hard mode, unresolved submission should not abort the run.
|
| 335 |
+
done = False
|
| 336 |
+
reward -= 0.1
|
| 337 |
+
output = (
|
| 338 |
+
"Solution incorrect. Incident is still CRITICAL. "
|
| 339 |
+
"Continue triage and remediation before submitting."
|
| 340 |
+
)
|
| 341 |
+
else:
|
| 342 |
+
done = True
|
| 343 |
+
output = "Solution incorrect. System is still CRITICAL."
|
| 344 |
+
|
| 345 |
+
else:
|
| 346 |
+
raise ValueError(f"Unsupported command: {action.command}")
|
| 347 |
+
|
| 348 |
+
except Exception as exc:
|
| 349 |
+
error = str(exc)
|
| 350 |
+
output = f"Command Failed: {error}"
|
| 351 |
+
|
| 352 |
+
if state.step_count >= self.MAX_STEPS and not done:
|
| 353 |
+
done = True
|
| 354 |
+
timeout_suffix = "\nTIMEOUT: Max steps reached."
|
| 355 |
+
output = f"{output}{timeout_suffix}" if output else timeout_suffix.strip()
|
| 356 |
+
|
| 357 |
+
reward = max(-1.0, min(1.0, reward))
|
| 358 |
+
status = "HEALTHY" if state.is_resolved else "CRITICAL"
|
| 359 |
+
info = {
|
| 360 |
+
"step_count": state.step_count,
|
| 361 |
+
"resolved": state.is_resolved,
|
| 362 |
+
"task": self.task_name,
|
| 363 |
+
"achievements": sorted(self._achievements),
|
| 364 |
+
"total_resources": len(state.resources),
|
| 365 |
+
}
|
| 366 |
+
|
| 367 |
+
return CloudObservation(
|
| 368 |
+
output=output,
|
| 369 |
+
error=error,
|
| 370 |
+
system_health_status=status,
|
| 371 |
+
done=done,
|
| 372 |
+
reward=reward,
|
| 373 |
+
metadata=info,
|
| 374 |
+
echoed_message=output,
|
| 375 |
+
message_length=len(output),
|
| 376 |
+
)
|
| 377 |
+
|
| 378 |
+
@property
|
| 379 |
+
def state(self) -> State:
|
| 380 |
+
"""Return hidden environment state for evaluators/debugging."""
|
| 381 |
+
if self._state_data is None:
|
| 382 |
+
self.reset()
|
| 383 |
+
assert self._state_data is not None
|
| 384 |
+
return self._state_data
|
server/requirements.txt
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
openenv[core]>=0.2.0
|
| 2 |
+
fastapi>=0.115.0
|
| 3 |
+
uvicorn>=0.24.0
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
|