Spaces:

ncncomplete
/

code-review-env

Sleeping

App Files Files Community

ncncomplete commited on 19 days ago

Commit

4ded5ed

verified ·

1 Parent(s): cf4ce1e

Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

Dockerfile +22 -61
README.md +135 -56
__init__.py +4 -8
client.py +31 -74
models.py +27 -39
openenv.yaml +5 -9
pyproject.toml +15 -25
server/Dockerfile.backup +25 -0
server/README.md +51 -0
server/__init__.py +3 -3
server/app.py +25 -163
server/python_codeact_env.py +117 -0
server/python_executor.py +157 -0
server/transforms.py +94 -0

Dockerfile CHANGED Viewed

@@ -1,74 +1,35 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-# Multi-stage build using openenv-base
-# This Dockerfile is flexible and works for both:
-# - In-repo environments (with local OpenEnv sources)
-# - Standalone environments (with openenv from PyPI/Git)
-# The build script (openenv build) handles context detection and sets appropriate build args.
-ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
-FROM ${BASE_IMAGE} AS builder
 WORKDIR /app
-# Ensure git is available (required for installing dependencies from VCS)
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends git && \
-    rm -rf /var/lib/apt/lists/*
-# Build argument to control whether we're building standalone or in-repo
-ARG BUILD_MODE=in-repo
-ARG ENV_NAME=code_review_env
-# Copy environment code (always at root of build context)
-COPY . /app/env
-# For in-repo builds, openenv is already vendored in the build context
-# For standalone builds, openenv will be installed via pyproject.toml
-WORKDIR /app/env
-# Ensure uv is available (for local builds where base image lacks it)
-RUN if ! command -v uv >/dev/null 2>&1; then \
-        curl -LsSf https://astral.sh/uv/install.sh | sh && \
-        mv /root/.local/bin/uv /usr/local/bin/uv && \
-        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
-    fi
-# Install dependencies using uv sync
-# If uv.lock exists, use it; otherwise resolve on the fly
-RUN --mount=type=cache,target=/root/.cache/uv \
-    if [ -f uv.lock ]; then \
-        uv sync --frozen --no-editable; \
-    else \
-        uv sync --no-editable; \
-    fi
-# Final runtime stage
-FROM ${BASE_IMAGE}
-WORKDIR /app
-# Copy the virtual environment from builder
-COPY --from=builder /app/env/.venv /app/.venv
-# Copy the environment code
-COPY --from=builder /app/env /app/env
-# Set PATH to use the virtual environment
-ENV PATH="/app/.venv/bin:$PATH"
-# Set PYTHONPATH so imports work correctly
-ENV PYTHONPATH="/app/env:$PYTHONPATH"
 # Health check
 HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
     CMD curl -f http://localhost:8000/health || exit 1
-# Run the FastAPI server
-# The module path is constructed to work with the /app/env structure
-ENV ENABLE_WEB_INTERFACE=true
-CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

+# Dockerfile for Coding Environment
+# Build from repo root:
+#   docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
+FROM python:3.11-slim
+# Set working directory
 WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# Copy coding_env package
+COPY envs/coding_env/ ./envs/coding_env/
+# Install openenv-core first from PyPI, then coding_env
+RUN pip install --no-cache-dir "openenv-core[core]>=0.2.2" && \
+    pip install --no-cache-dir ./envs/coding_env/
+# Environment variables
+ENV PYTHONUNBUFFERED=1
+ENV ENABLE_WEB_INTERFACE=true
+# Expose port
+EXPOSE 8000
 # Health check
 HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
     CMD curl -f http://localhost:8000/health || exit 1
+# Run the server
+CMD ["uvicorn", "coding_env.server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,84 +1,163 @@
 ---
-title: Code Review Environment
-emoji: 🎯
-colorFrom: pink
-colorTo: pink
 sdk: docker
 pinned: false
 app_port: 8000
 tags:
   - openenv
-base_path: /web
 ---
-# Code Review Environment
-An OpenEnv environment where an AI agent reviews Python code snippets to identify bugs across three difficulty levels.
-🤗 **Space:** https://huggingface.co/spaces/ncncomplete/code-review-env
-## Environment Description
-The agent receives a Python code snippet and must identify the bug type, line number, and provide an explanation. The environment simulates real-world code review tasks that developers perform daily.
-## Tasks
-| Task | Difficulty | Description |
-|------|-----------|-------------|
-| easy | Easy | Identify syntax/runtime errors |
-| medium | Medium | Identify logic bugs in code that runs but produces wrong output |
-| hard | Hard | Identify security vulnerabilities |
-## Action Space
-| Field | Type | Description |
-|-------|------|-------------|
-| review | str | Written analysis of the code |
-| bug_type | str | One of: syntax, logic, security, none |
-| line_number | int | Line number where bug occurs (-1 if unknown) |
-| confidence | float | Agent confidence 0.0–1.0 |
-## Observation Space
-| Field | Type | Description |
-|-------|------|-------------|
-| code_snippet | str | Python code to review |
-| task_description | str | What the agent is asked to do |
-| task_id | str | easy, medium, or hard |
-| attempt_number | int | Steps taken so far |
-| previous_feedback | str | Feedback from last step |
-| done | bool | Whether episode is complete |
-## Reward Function
-- **+1.0** correct bug type identified
-- **+0.5** correct line number identified
-- **+0.5** quality explanation (key concepts present)
-- **-0.3** wrong bug category confidently stated
-- **-0.1** per retry after first attempt
-- Normalized to 0.0–1.0 range
-## Baseline Scores
-| Task | Score |
-|------|-------|
-| easy | 1.0 |
-| medium | 1.0 |
-| hard | 1.0 |
-| **average** | **1.0** |
-## Setup
 ```bash
-pip install openenv-core fastapi uvicorn pydantic openai
-uvicorn server.app:app --host 0.0.0.0 --port 8000
 ```
-## API Endpoints
-- `POST /reset` — Start new episode with `{"task_id": "easy|medium|hard"}`
-- `POST /step` — Submit action with `{"action": {...}}`
-- `GET /state` — Get current environment state
-- `GET /tasks` — List all tasks and action schema
-- `GET /grader` — Get grader score for a task
-- `GET /baseline` — Run baseline inference on all tasks

 ---
+title: Coding Environment Server
+emoji: 💻
+colorFrom: blue
+colorTo: blue
 sdk: docker
 pinned: false
 app_port: 8000
+base_path: /web
 tags:
   - openenv
 ---
+# Coding Environment
+A Python code execution environment that runs arbitrary Python code and returns results. Perfect for testing code execution infrastructure and demonstrating environment usage patterns.
+## Quick Start
+The simplest way to use the Coding environment is through the `CodingEnv` class. The client is **async by default**:
+```python
+import asyncio
+from coding_env import CodeAction, CodingEnv
+async def main():
+    # Create environment from Docker image
+    client = await CodingEnv.from_docker_image("coding-env:latest")
+    async with client:
+        # Reset
+        result = await client.reset()
+        print(f"Reset complete: exit_code={result.observation.exit_code}")
+        # Execute Python code
+        code_samples = [
+            "print('Hello, World!')",
+            "x = 5 + 3\nprint(f'Result: {x}')",
+            "import math\nprint(math.pi)"
+        ]
+        for code in code_samples:
+            result = await client.step(CodeAction(code=code))
+            print(f"Code: {code}")
+            print(f"  → stdout: {result.observation.stdout.strip()}")
+            print(f"  → exit_code: {result.observation.exit_code}")
+asyncio.run(main())
+```
+For **synchronous usage**, use the `.sync()` wrapper:
+```python
+from coding_env import CodeAction, CodingEnv
+with CodingEnv(base_url="http://localhost:8000").sync() as client:
+    result = client.reset()
+    result = client.step(CodeAction(code="print('Hello!')"))
+    print(result.observation.stdout)
+```
+The `CodingEnv.from_docker_image()` method handles:
+- Starting the Docker container
+- Waiting for the server to be ready
+- Connecting to the environment
+- Container cleanup when the context manager exits
+## Building the Docker Image
+Before using the environment, you need to build the Docker image:
+```bash
+# From project root
+docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
+```
+## Environment Details
+### Action
+**CodeAction**: Contains a single field
+- `code` (str) - The Python code to execute
+### Observation
+**CodeObservation**: Contains the execution results
+- `stdout` (str) - Standard output from code execution
+- `stderr` (str) - Standard error from code execution
+- `exit_code` (int) - Exit code (0 for success, non-zero for errors)
+### State
+**CodeState**: Tracks execution state
+- `episode_id` (str) - Unique identifier for the episode
+- `step_count` (int) - Number of steps taken
+- `last_exit_code` (int) - Exit code from the last execution
+## Advanced Usage
+### Connecting to an Existing Server
+If you already have a Coding environment server running, you can connect directly:
+```python
+from coding_env import CodeAction, CodingEnv
+# Async usage
+async with CodingEnv(base_url="http://localhost:8000") as client:
+    result = await client.reset()
+    result = await client.step(CodeAction(code="print('Hello!')"))
+# Sync usage
+with CodingEnv(base_url="http://localhost:8000").sync() as client:
+    result = client.reset()
+    result = client.step(CodeAction(code="print('Hello!')"))
+```
+Note: When connecting to an existing server, closing the client will NOT stop the server.
+## Development & Testing
+### Running Tests
+Install the coding_env package with dev dependencies and run the tests from the repo root:
+```bash
+# Install coding_env with dev dependencies (includes smolagents and pytest)
+uv pip install -e "envs/coding_env[dev]"
+# Run unit tests (no Docker required)
+uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v
+# Run integration tests (requires Docker image to be built)
+docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
+SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v
+```
+### Running the Full Example
+Run the complete example that demonstrates the full workflow:
 ```bash
+python3 envs/coding_env/client/example_usage.py
 ```
+This example shows:
+- Creating an environment from a Docker image
+- Resetting and executing code through the environment
+- Automatic cleanup with `close()`
+## Project Structure
+```
+coding_env/
+├── README.md              # This file
+├── models.py              # Action, Observation, and State models
+├── client/
+│   ├── coding_env_client.py  # CodingEnv client implementation
+│   └── example_usage.py      # Usage examples
+└── server/
+    ├── python_codeact_env.py  # Core environment logic
+    ├── app.py                 # FastAPI application
+    ├── transforms.py          # Observation transforms
+    ├── Dockerfile             # Container image definition
+    └── README.md              # Server-specific documentation
+```

__init__.py CHANGED Viewed

@@ -4,13 +4,9 @@
 # This source code is licensed under the BSD-style license found in the
 # LICENSE file in the root directory of this source tree.
-"""Code Review Env Environment."""
-from .client import CodeReviewEnv
-from .models import CodeReviewAction, CodeReviewObservation
-__all__ = [
-    "CodeReviewAction",
-    "CodeReviewObservation",
-    "CodeReviewEnv",
-]

 # This source code is licensed under the BSD-style license found in the
 # LICENSE file in the root directory of this source tree.
+"""Coding Environment - A Python code execution environment."""
+from .client import CodingEnv
+from .models import CodeAction, CodeObservation, CodeState
+__all__ = ["CodingEnv", "CodeAction", "CodeObservation", "CodeState"]

client.py CHANGED Viewed

@@ -1,99 +1,56 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
-"""Code Review Env Environment Client."""
-from typing import Dict
-from openenv.core import EnvClient
-from openenv.core.client_types import StepResult
-from openenv.core.env_server.types import State
-from .models import CodeReviewAction, CodeReviewObservation
-class CodeReviewEnv(
-    EnvClient[CodeReviewAction, CodeReviewObservation, State]
-):
-    """
-    Client for the Code Review Env Environment.
-    This client maintains a persistent WebSocket connection to the environment server,
-    enabling efficient multi-step interactions with lower latency.
-    Each client instance has its own dedicated environment session on the server.
-    Example:
-        >>> # Connect to a running server
-        >>> with CodeReviewEnv(base_url="http://localhost:8000") as client:
-        ...     result = client.reset()
-        ...     print(result.observation.echoed_message)
-        ...
-        ...     result = client.step(CodeReviewAction(message="Hello!"))
-        ...     print(result.observation.echoed_message)
-    Example with Docker:
-        >>> # Automatically start container and connect
-        >>> client = CodeReviewEnv.from_docker_image("code_review_env-env:latest")
-        >>> try:
-        ...     result = client.reset()
-        ...     result = client.step(CodeReviewAction(message="Test"))
-        ... finally:
-        ...     client.close()
-    """
-    def _step_payload(self, action: CodeReviewAction) -> Dict:
-        """
-        Convert CodeReviewAction to JSON payload for step message.
-        Args:
-            action: CodeReviewAction instance
-        Returns:
-            Dictionary representation suitable for JSON encoding
-        """
         return {
-            "message": action.message,
         }
-    def _parse_result(self, payload: Dict) -> StepResult[CodeReviewObservation]:
-        """
-        Parse server response into StepResult[CodeReviewObservation].
-        Args:
-            payload: JSON response data from server
-        Returns:
-            StepResult with CodeReviewObservation
-        """
-        obs_data = payload.get("observation", {})
-        observation = CodeReviewObservation(
-            echoed_message=obs_data.get("echoed_message", ""),
-            message_length=obs_data.get("message_length", 0),
-            done=payload.get("done", False),
-            reward=payload.get("reward"),
-            metadata=obs_data.get("metadata", {}),
-        )
         return StepResult(
-            observation=observation,
             reward=payload.get("reward"),
-            done=payload.get("done", False),
         )
-    def _parse_state(self, payload: Dict) -> State:
         """
-        Parse server response into State object.
         Args:
-            payload: JSON response from state request
         Returns:
-            State object with episode_id and step_count
         """
-        return State(
             episode_id=payload.get("episode_id"),
             step_count=payload.get("step_count", 0),
         )

+"""
+CodingEnv
+---------
+Client-side wrapper for the Coding environment server.
+This client maintains a persistent WebSocket connection to the environment
+server, enabling efficient multi-step interactions with lower latency.
+- users instantiate CodingEnv with a base_url provided by the higher-level
+  vector/orchestration layer.
+- Environment authors ship the Docker image that serves the API.
+(Seeds, episode IDs, request IDs, capabilities can be added later in the payloads.)
+"""
+from __future__ import annotations
+from openenv.core.client_types import StepResult
+from openenv.core.env_client import EnvClient
+from .models import CodeAction, CodeObservation, CodeState
+class CodingEnv(EnvClient[CodeAction, CodeObservation, CodeState]):
+    # --- HTTPEnvClient abstract hooks ---
+    def _step_payload(self, action: CodeAction) -> dict:
+        # Shape expected by the server's /step endpoint under "action"
         return {
+            "code": action.code,
         }
+    def _parse_result(self, payload: dict) -> StepResult[CodeObservation]:
+        # Expecting: { "observation": {...}, "reward": <float|null>, "done": <bool>, "info": {...} }
+        obs = CodeObservation(**payload["observation"])
         return StepResult(
+            observation=obs,
             reward=payload.get("reward"),
+            done=bool(payload.get("done", False)),
         )
+    def _parse_state(self, payload: dict) -> CodeState:
         """
+        Parse server response into CodeState object.
         Args:
+            payload: JSON response from /state endpoint
         Returns:
+            CodeState object with episode_id, step_count, and last_exit_code
         """
+        return CodeState(
             episode_id=payload.get("episode_id"),
             step_count=payload.get("step_count", 0),
+            last_exit_code=payload.get("last_exit_code", 0),
         )

models.py CHANGED Viewed

@@ -1,46 +1,34 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
 """
-Data models for the Code Review Environment.
-Agent receives Python code snippets and must identify bugs.
 """
 from __future__ import annotations
-from typing import Optional
 from openenv.core.env_server.interfaces import Action, Observation, State
-class ReviewAction(Action):
-    """Action taken by the agent to review a code snippet."""
-    review: str                    # agent's written analysis
-    bug_type: str                  # "syntax" | "logic" | "security" | "none"
-    line_number: int               # which line has the issue, -1 if unknown
-    confidence: float              # agent's confidence 0.0-1.0
-class ReviewObservation(Observation):
-    """What the agent sees at each step."""
-    code_snippet: str              # the Python code to review
-    task_description: str          # what the agent is asked to do
-    task_id: str                   # "easy" | "medium" | "hard"
-    attempt_number: int            # how many steps taken so far
-    previous_feedback: str         # feedback from last step, empty on reset
-    done: bool                     # whether episode is complete
-    hint: Optional[str] = None     # optional hint for the agent
-class ReviewState(State):
-    """Internal environment state."""
-    current_task_id: str = "easy"
-    current_snippet: str = ""
-    correct_bug_type: str = ""
-    correct_line_number: int = -1
-    correct_keywords: list = []
-    step_count: int = 0
-    task_episode_id: str = ""
-    cumulative_reward: float = 0.0
-    total_snippets: int = 4

 """
+envs/coding_env/models.py
+--------------------------------
+Action/Observation types for the Coding environment.
 """
 from __future__ import annotations
 from openenv.core.env_server.interfaces import Action, Observation, State
+class CodeAction(Action):
+    """
+    Represents a single code execution request.
+    """
+    code: str
+    # Optional: future fields like 'lint': bool, 'timeout_s': float, etc.
+class CodeObservation(Observation):
+    """
+    Result of executing code in the environment.
+    """
+    stdout: str = ""
+    stderr: str = ""
+    exit_code: int = 0
+class CodeState(State):
+    """State for CodeAct environment with persistent execution context."""
+    last_exit_code: int = 0

openenv.yaml CHANGED Viewed

@@ -1,9 +1,5 @@
-spec_version: 1
-name: code_review_env
-type: space
-runtime: fastapi
-app: server.app:app
-port: 8000
-version: "1.0.0"
-description: "AI agent environment for Python code review across syntax, logic, and security bug detection"

+name: coding_env
+version: "0.1.0"
+description: "Coding environment for OpenEnv"
+action: CodeAction
+observation: CodeObservation

pyproject.toml CHANGED Viewed

@@ -1,45 +1,35 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the BSD-style license found in the
-# LICENSE file in the root directory of this source tree.
 [build-system]
 requires = ["setuptools>=45", "wheel"]
 build-backend = "setuptools.build_meta"
 [project]
-name = "openenv-code_review_env"
 version = "0.1.0"
-description = "Code Review Env environment for OpenEnv"
 requires-python = ">=3.10"
 dependencies = [
-    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
-    # install from github
-    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
     "openenv-core[core]>=0.2.2",
-    # Environment-specific dependencies
-    # Add all dependencies needed for your environment here
-    # Examples:
-    # "numpy>=1.19.0",
-    # "torch>=2.0.0",
-    # "gymnasium>=0.29.0",
-    # "openspiel>=1.0.0",
-    # "smolagents>=1.22.0,<2",
 ]
 [project.optional-dependencies]
 dev = [
     "pytest>=8.0.0",
     "pytest-cov>=4.0.0",
 ]
 [project.scripts]
-# Server entry point - enables running via: uv run --project . server
-# or: python -m code_review_env.server.app
-server = "code_review_env.server.app:main"
 [tool.setuptools]
-include-package-data = true
-packages = ["code_review_env", "code_review_env.server"]
-package-dir = { "code_review_env" = ".", "code_review_env.server" = "server" }

 [build-system]
 requires = ["setuptools>=45", "wheel"]
 build-backend = "setuptools.build_meta"
 [project]
+name = "openenv-coding_env"
 version = "0.1.0"
+description = "Coding Environment for OpenEnv"
 requires-python = ">=3.10"
 dependencies = [
     "openenv-core[core]>=0.2.2",
+    "fastapi>=0.115.0",
+    "pydantic>=2.0.0",
+    "uvicorn[standard]>=0.24.0",
+    "requests>=2.31.0",
+    "smolagents>=1.22.0,<2",
 ]
 [project.optional-dependencies]
 dev = [
     "pytest>=8.0.0",
     "pytest-cov>=4.0.0",
+    "ipykernel>=6.29.5",
 ]
 [project.scripts]
+server = "coding_env.server.app:main"
 [tool.setuptools]
+packages = ["coding_env", "coding_env.server"]
+package-dir = { "coding_env" = ".", "coding_env.server" = "server" }
+[tool.setuptools.package-data]
+coding_env = ["**/*.yaml", "**/*.yml"]

server/Dockerfile.backup ADDED Viewed

	@@ -0,0 +1,25 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Use the standard openenv base image
+# Built from: docker build -t openenv-base:latest -f src/openenv/core/containers/images/Dockerfile .
+# In GitHub Actions, this is overridden to use the GHCR base image
+ARG BASE_IMAGE=openenv-base:latest
+FROM ${BASE_IMAGE}
+# Copy only what's needed for this environment
+COPY src/core/ /app/src/core/
+COPY envs/coding_env/ /app/envs/coding_env/
+# Copy README for web interface documentation
+COPY envs/coding_env/README.md /app/README.md
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+CMD ["uvicorn", "envs.coding_env.server.app:app", "--host", "0.0.0.0", "--port", "8000"]

server/README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+# CodingEnv HTTP Server
+This directory contains the HTTP server implementation for the CodingEnvironment.
+## Running Locally
+### Prerequisites
+```bash
+pip install fastapi uvicorn
+```
+### Start the server
+```bash
+# From the project root (/Users/pankit/git/envtorch)
+cd src
+uvicorn envs.coding_env.server.app:app --reload --host 0.0.0.0 --port 8000
+```
+The server will be available at `http://localhost:8000`
+### API Endpoints
+- `POST /reset` - Reset the environment
+- `POST /step` - Execute a code action
+- `GET /state` - Get current environment state
+- `GET /health` - Health check
+### Test with curl
+```bash
+# Health check
+curl http://localhost:8000/health
+# Reset
+curl -X POST http://localhost:8000/reset \
+  -H "Content-Type: application/json" \
+  -d '{}'
+# Execute code
+curl -X POST http://localhost:8000/step \
+  -H "Content-Type: application/json" \
+  -d '{
+    "action": {
+      "code": "print(\"Hello from HTTP!\")"
+    },
+    "timeout_s": 15
+  }'
+# Get state
+curl http://localhost:8000/state
+```

server/__init__.py CHANGED Viewed

@@ -4,8 +4,8 @@
 # This source code is licensed under the BSD-style license found in the
 # LICENSE file in the root directory of this source tree.
-"""Code Review Env environment server components."""
-from .code_review_env_environment import CodeReviewEnvironment
-__all__ = ["CodeReviewEnvironment"]

 # This source code is licensed under the BSD-style license found in the
 # LICENSE file in the root directory of this source tree.
+"""Coding environment server components."""
+from .python_codeact_env import PythonCodeActEnv
+__all__ = ["PythonCodeActEnv"]

server/app.py CHANGED Viewed

@@ -5,181 +5,43 @@
 # LICENSE file in the root directory of this source tree.
 """
-FastAPI server for the Code Review Environment.
 """
-from models import ReviewAction, ReviewObservation
-from server.code_review_env_environment import CodeReviewEnvironment
 from openenv.core.env_server import create_app
-from fastapi import FastAPI, Query
-from fastapi.routing import APIRouter
-app = create_app(
-    CodeReviewEnvironment,
-    ReviewAction,
-    ReviewObservation,
-    env_name="code_review_env",
-)
-@app.get("/tasks")
-def list_tasks():
-    return {
-        "tasks": [
-            {
-                "task_id": "easy",
-                "description": "Identify syntax/runtime errors in Python code",
-                "difficulty": "easy",
-                "action_schema": {
-                    "review": "string - your analysis",
-                    "bug_type": "string - syntax | logic | security | none",
-                    "line_number": "int - line with the bug, -1 if unknown",
-                    "confidence": "float - your confidence 0.0 to 1.0"
-                },
-                "example_action": {
-                    "review": "Line 1 is missing a colon after the function definition. This is a syntax error.",
-                    "bug_type": "syntax",
-                    "line_number": 1,
-                    "confidence": 0.95
-                }
-            },
-            {
-                "task_id": "medium",
-                "description": "Identify logic bugs in code that runs but produces wrong output",
-                "difficulty": "medium",
-                "action_schema": {
-                    "review": "string - your analysis",
-                    "bug_type": "string - syntax | logic | security | none",
-                    "line_number": "int - line with the bug, -1 if unknown",
-                    "confidence": "float - your confidence 0.0 to 1.0"
-                },
-                "example_action": {
-                    "review": "Line 5 has an index error: it should be max_val = numbers[i], not numbers[i - 1]. This is a logic bug.",
-                    "bug_type": "logic",
-                    "line_number": 5,
-                    "confidence": 0.95
-                }
-            },
-            {
-                "task_id": "hard",
-                "description": "Identify security vulnerabilities in Python code",
-                "difficulty": "hard",
-                "action_schema": {
-                    "review": "string - your analysis",
-                    "bug_type": "string - syntax | logic | security | none",
-                    "line_number": "int - line with the bug, -1 if unknown",
-                    "confidence": "float - your confidence 0.0 to 1.0"
-                },
-                "example_action": {
-                    "review": "Line 6 has a SQL injection vulnerability because the username is concatenated directly into the query without parameterized statements.",
-                    "bug_type": "security",
-                    "line_number": 6,
-                    "confidence": 0.95
-                }
-            }
-        ]
-    }
-@app.get("/info")
-def info():
-    """
-    Returns information about the Code Review Environment.
-    Returns: environment name, version, description, number of tasks, and supported difficulty levels
-    """
-    return {
-        "name": "code_review_env",
-        "version": "1.0.0",
-        "description": "AI agent environment for Python code review across syntax, logic, and security bug detection",
-        "num_tasks": 3,
-        "difficulty_levels": ["easy", "medium", "hard"]
-    }
-@app.get("/grader")
-def grader(task_id: str = Query("easy"), episode_id: str = Query(None)):
-    """
-    Run a single task with a perfect answer.
-    Query params: task_id (str), episode_id (str, optional)
-    Returns: {"task_id": str, "score": float, "feedback": str}
-    """
-    env = CodeReviewEnvironment()
-    env.reset(task_id)
-    # Create perfect answer based on task_id
-    if task_id == "easy":
-        action = ReviewAction(
-            review="Line 1 is missing a colon after the function definition. This is a syntax error.",
-            bug_type="syntax",
-            line_number=1,
-            confidence=0.95
-        )
-    elif task_id == "medium":
-        action = ReviewAction(
-            review="Line 5 has an index error: it should be max_val = numbers[i], not numbers[i - 1]. This is a logic bug.",
-            bug_type="logic",
-            line_number=5,
-            confidence=0.95
-        )
-    else:  # hard
-        action = ReviewAction(
-            review="Line 6 has a SQL injection vulnerability because the username is concatenated directly into the query without parameterized statements.",
-            bug_type="security",
-            line_number=6,
-            confidence=0.95
-        )
-    obs = env.step(action)
-    return {
-        "task_id": task_id,
-        "score": env.state.cumulative_reward,
-        "feedback": obs.previous_feedback
-    }
-@app.get("/baseline")
-def baseline():
-    """
-    Run all 3 tasks (easy, medium, hard) with perfect hardcoded answers.
-    Returns: {"scores": {"easy": float, "medium": float, "hard": float}, "average": float}
-    """
-    scores = {}
-    for task_id in ["easy", "medium", "hard"]:
-        env = CodeReviewEnvironment()
-        env.reset(task_id)
-        # Create perfect answer based on task_id
-        if task_id == "easy":
-            action = ReviewAction(
-                review="Line 1 is missing a colon after the function definition. This is a syntax error.",
-                bug_type="syntax",
-                line_number=1,
-                confidence=0.95
-            )
-        elif task_id == "medium":
-            action = ReviewAction(
-                review="Line 5 has an index error: it should be max_val = numbers[i], not numbers[i - 1]. This is a logic bug.",
-                bug_type="logic",
-                line_number=5,
-                confidence=0.95
-            )
-        else:  # hard
-            action = ReviewAction(
-                review="Line 6 has a SQL injection vulnerability because the username is concatenated directly into the query without parameterized statements.",
-                bug_type="security",
-                line_number=6,
-                confidence=0.95
-            )
-        obs = env.step(action)
-        scores[task_id] = env.state.cumulative_reward
-    average = sum(scores.values()) / len(scores)
-    return {
-        "scores": scores,
-        "average": round(average, 4)
-    }
 def main():
     import uvicorn
     uvicorn.run(app, host="0.0.0.0", port=8000)
 if __name__ == "__main__":
     main()

 # LICENSE file in the root directory of this source tree.
 """
+FastAPI application for the Coding Environment.
+This module creates an HTTP server that exposes the PythonCodeActEnv
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Usage:
+    # Development (with auto-reload):
+    uvicorn envs.coding_env.server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn envs.coding_env.server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m envs.coding_env.server.app
 """
+from coding_env.models import CodeAction, CodeObservation
+from coding_env.server.python_codeact_env import PythonCodeActEnv
 from openenv.core.env_server import create_app
+# Create the app with web interface and README integration
+# Pass the class (factory) instead of an instance for WebSocket session support
+app = create_app(PythonCodeActEnv, CodeAction, CodeObservation, env_name="coding_env")
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
 def main():
+    """Main entry point for running the server."""
     import uvicorn
     uvicorn.run(app, host="0.0.0.0", port=8000)
 if __name__ == "__main__":
     main()

server/python_codeact_env.py ADDED Viewed

	@@ -0,0 +1,117 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Python Code Action Environment.
+This module provides a server-side environment implementation for executing
+Python code actions using PyExecutor.
+"""
+import uuid
+from openenv.core.env_server.interfaces import Action, Environment, Observation
+from ..models import CodeAction, CodeObservation, CodeState
+from .python_executor import PyExecutor
+from .transforms import create_safe_coding_transform
+class PythonCodeActEnv(Environment):
+    """
+    Python Code Action Environment for executing code and tracking state.
+    This environment executes Python code submitted as CodeAction during step,
+    maintains the last exit code in its state, and returns results wrapped
+    in CodeObservation.
+    Args:
+        transform: Optional transform to apply to observations
+        additional_imports: List of additional module imports to authorize
+                          (e.g., ["numpy", "pandas", "matplotlib"])
+    Example:
+        >>> env = PythonCodeActEnv()
+        >>> obs = env.reset()
+        >>> action = CodeAction(code="print('Hello, World!')")
+        >>> obs = env.step(action)
+        >>> print(obs.stdout)  # "Hello, World!\n"
+        >>> print(obs.exit_code)  # 0
+        >>> print(env.state.last_exit_code)  # 0
+    """
+    def __init__(
+        self,
+    ):
+        self.transform = create_safe_coding_transform()
+        self._executor = PyExecutor()
+        self._state = CodeState()
+    def reset(self) -> Observation:
+        """
+        Reset environment and start fresh execution session.
+        Returns:
+            Initial observation with empty stdout/stderr and exit_code=0
+        """
+        # Initialize fresh state
+        self._state = CodeState(episode_id=str(uuid.uuid4()), step_count=0)
+        # Add last_exit_code to state
+        self._state.last_exit_code = 0
+        # Reset executor to clear any previously defined variables/functions
+        self._executor = PyExecutor()
+        # Reset transform to clear any accumulated state
+        self.transform = create_safe_coding_transform()
+        # Return initial observation
+        observation = CodeObservation(
+            stdout="",
+            stderr="",
+            exit_code=0,
+        )
+        return self._apply_transform(observation)
+    def step(self, action: Action) -> Observation:
+        """
+        Execute code action and return observation.
+        Args:
+            action: CodeAction containing the code to execute
+        Returns:
+            CodeObservation with execution results (stdout, stderr, exit_code)
+        Raises:
+            ValueError: If action is not a CodeAction instance
+        """
+        if not isinstance(action, CodeAction):
+            raise ValueError(f"Expected CodeAction, got {type(action)}")
+        # Execute the code using PyExecutor
+        result = self._executor.run(action.code)
+        # Update state
+        self._state.step_count += 1
+        self._state.last_exit_code = result.exit_code
+        # Create observation from execution result
+        # Include code in metadata for transform reward calculation
+        observation = CodeObservation(
+            stdout=result.stdout,
+            stderr=result.stderr,
+            exit_code=result.exit_code,
+            metadata={"last_code": action.code},
+        )
+        return self._apply_transform(observation)
+    @property
+    def state(self) -> CodeState:
+        """Get current environment state including last exit code."""
+        return self._state

server/python_executor.py ADDED Viewed

	@@ -0,0 +1,157 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Local Python Executor (enhanced).
+This module provides a safer wrapper around smolagents.LocalPythonExecutor
+with improved exception handling and a few helpful tools registered with
+the executor to make debugging executed code easier.
+Key improvements:
+- Register a few helper utilities via send_tools so user code can use
+  them for reporting (e.g. `format_exc`).
+- More robust extraction of stdout/stderr/exit codes from the executor
+  result object, tolerant to different versions of smolagents.
+- Detailed stderr on unexpected exceptions including full traceback.
+- Structured logging for operational visibility.
+"""
+from __future__ import annotations
+import json
+import logging
+import traceback
+from openenv.core.env_server.types import CodeExecResult
+from smolagents import LocalPythonExecutor
+logger = logging.getLogger(__name__)
+logger.addHandler(logging.NullHandler())
+class PyExecutor:
+    """Wrapper around smolagents LocalPythonExecutor.
+    The wrapper registers a few non-privileged helper tools to the
+    LocalPythonExecutor that can be used by the executed code to
+    format exceptions and to safely stringify results for improved
+    error reporting.
+    """
+    def __init__(self, additional_imports: list[str] | None = None):
+        if additional_imports is None:
+            additional_imports = []
+        self._executor = LocalPythonExecutor(
+            additional_authorized_imports=additional_imports
+        )
+        # Register helpful utilities exposed to the execution environment.
+        # These are intentionally small, read-only helpers.
+        tools = {
+            # Provide a small helper to format the current exception in the
+            # executed context. This is a *string formatting* helper only.
+            "format_exc": traceback.format_exc,
+            # Safe JSON dumps with a fallback for non-serializable objects.
+            "safe_json_dumps": lambda obj: json.dumps(obj, default=lambda o: repr(o)),
+        }
+        # `send_tools` is the public API on LocalPythonExecutor to make
+        # helper callables available to the sandboxed runtime. We don't
+        # provide any builtins that could change the environment.
+        try:
+            self._executor.send_tools(tools)
+        except Exception:
+            # If the LocalPythonExecutor implementation doesn't support
+            # send_tools or fails, log and continue — the executor is still usable.
+            logger.debug(
+                "LocalPythonExecutor.send_tools failed; continuing without extra tools",
+                exc_info=True,
+            )
+    def run(self, code: str) -> CodeExecResult:
+        """Execute Python code and return a CodeExecResult.
+        This method is intentionally defensive: it attempts to extract
+        meaningful stdout/stderr/exit_code information from a variety of
+        possible return shapes that different versions of smolagents
+        may provide.
+        """
+        try:
+            exec_result = self._executor(code)
+            # Default values
+            stdout_parts: list[str] = []
+            stderr_parts: list[str] = []
+            exit_code = 0
+            # Extract logs/prints
+            try:
+                logs = getattr(exec_result, "logs", None)
+                if logs:
+                    stdout_parts.append(str(logs))
+            except Exception:
+                logger.debug("Failed to read exec_result.logs", exc_info=True)
+            # Extract the result / output value
+            try:
+                if hasattr(exec_result, "output"):
+                    out_val = exec_result.output
+                    # If the output is not None, stringify it in a safe way
+                    if out_val is not None:
+                        # Prefer JSON if possible, otherwise repr
+                        try:
+                            stdout_parts.append(json.dumps(out_val))
+                        except Exception:
+                            stdout_parts.append(repr(out_val))
+            except Exception:
+                logger.debug("Failed to read exec_result.output", exc_info=True)
+            # Some runtime implementations may put errors on `error` or `exception`
+            try:
+                err = getattr(exec_result, "error", None)
+                if err:
+                    stderr_parts.append(str(err))
+            except Exception:
+                logger.debug("Failed to read exec_result.error", exc_info=True)
+            try:
+                ex = getattr(exec_result, "exception", None)
+                if ex:
+                    stderr_parts.append(str(ex))
+            except Exception:
+                logger.debug("Failed to read exec_result.exception", exc_info=True)
+            # Determine exit code if provided
+            try:
+                if hasattr(exec_result, "exit_code"):
+                    exit_code = (
+                        int(exec_result.exit_code)
+                        if exec_result.exit_code is not None
+                        else 0
+                    )
+                elif hasattr(exec_result, "success"):
+                    # Some versions use `success` boolean
+                    exit_code = 0 if exec_result.success else 1
+                else:
+                    # Fallback: if there were any stderr parts, treat as non-zero
+                    exit_code = 1 if stderr_parts else 0
+            except Exception:
+                logger.debug("Failed to determine exec_result exit code", exc_info=True)
+                exit_code = 1 if stderr_parts else 0
+            # Compose the final stdout/stderr strings
+            stdout = "\n".join(part for part in stdout_parts if part is not None)
+            stderr = "\n".join(part for part in stderr_parts if part is not None)
+            return CodeExecResult(stdout=stdout, stderr=stderr, exit_code=exit_code)
+        except Exception as e:
+            # Any unexpected exception from the LocalPythonExecutor is
+            # returned with a full traceback to make debugging easier.
+            tb = traceback.format_exc()
+            logger.exception("LocalPythonExecutor raised an exception during run")
+            return CodeExecResult(stdout="", stderr=tb, exit_code=1)

server/transforms.py ADDED Viewed

	@@ -0,0 +1,94 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Transforms specific to coding environments."""
+import ast
+import re
+from openenv.core.env_server.base_transforms import CompositeTransform
+from openenv.core.env_server.interfaces import Transform
+from openenv.core.env_server.types import Observation
+from ..models import CodeObservation
+class CodeSafetyTransform(Transform):
+    """Evaluates code safety and assigns penalties for dangerous patterns."""
+    def __init__(self, penalty: float = -1.0):
+        self.penalty = penalty
+        self.dangerous_patterns = [
+            r"import\s+os",
+            r"import\s+subprocess",
+            r"eval\(",
+            r"exec\(",
+            r"__import__",
+            r"open\(",
+        ]
+    def __call__(self, observation: Observation) -> Observation:
+        if not isinstance(observation, CodeObservation):
+            return observation
+        if "last_code" in observation.metadata:
+            code = observation.metadata["last_code"]
+            for pattern in self.dangerous_patterns:
+                if re.search(pattern, code):
+                    observation.reward = self.penalty
+                    observation.metadata["safety_violation"] = pattern
+                    break
+            else:
+                if observation.reward is None:
+                    observation.reward = 0.0
+        return observation
+class CodeQualityTransform(Transform):
+    """Evaluates and rewards code quality metrics."""
+    def __init__(
+        self,
+        concise_bonus: float = 0.1,
+        max_length_threshold: int = 100,
+        syntax_penalty: float = -0.2,
+    ):
+        self.concise_bonus = concise_bonus
+        self.max_length_threshold = max_length_threshold
+        self.syntax_penalty = syntax_penalty
+    def __call__(self, observation: Observation) -> Observation:
+        if not isinstance(observation, CodeObservation):
+            return observation
+        quality_score = 0.0
+        if "last_code" in observation.metadata:
+            code = observation.metadata["last_code"]
+            # Reward concise code
+            if len(code.strip()) <= self.max_length_threshold:
+                quality_score += self.concise_bonus
+            # Check syntax (redundant but useful for quality assessment)
+            try:
+                ast.parse(code)
+            except SyntaxError:
+                quality_score += self.syntax_penalty
+        # Add to existing reward
+        if observation.reward is None:
+            observation.reward = quality_score
+        else:
+            observation.reward += quality_score
+        return observation
+def create_safe_coding_transform() -> CompositeTransform:
+    """Create a transform focused on safe coding practices and quality."""
+    return CompositeTransform([CodeSafetyTransform(), CodeQualityTransform()])