Spaces:
Sleeping
title: Coding Environment Server
emoji: π»
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
Coding Environment
A real-world PR triage and code review environment with three graded tasks (easy/medium/hard). Each episode presents pull request metadata and a unified diff, then asks the agent to submit a structured review.
Quick Start
The simplest way to use the Coding environment is through the CodingEnv class. The client is async by default:
import asyncio
from coding_env import CodeAction, CodingEnv
async def main():
# Create environment from Docker image
client = await CodingEnv.from_docker_image("coding-env:latest")
async with client:
# Reset
result = await client.reset()
print(f"Reset complete: exit_code={result.observation.exit_code}")
# Execute Python code
code_samples = [
"print('Hello, World!')",
"x = 5 + 3\nprint(f'Result: {x}')",
"import math\nprint(math.pi)"
]
for code in code_samples:
result = await client.step(CodeAction(code=code))
print(f"Code: {code}")
print(f" β stdout: {result.observation.stdout.strip()}")
print(f" β exit_code: {result.observation.exit_code}")
asyncio.run(main())
For synchronous usage, use the .sync() wrapper:
from coding_env import CodeAction, CodingEnv
with CodingEnv(base_url="http://localhost:8000").sync() as client:
result = client.reset()
result = client.step(CodeAction(code="print('Hello!')"))
print(result.observation.stdout)
The CodingEnv.from_docker_image() method handles:
- Starting the Docker container
- Waiting for the server to be ready
- Connecting to the environment
- Container cleanup when the context manager exits
Building the Docker Image
Before using the environment, you need to build the Docker image:
# From project root
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
Environment Details
Action
CodeAction fields:
review(str) - Human-readable review summaryfile_path(str) - Changed file being flaggedissue_type(str) -logic|security|performance|maintainabilityseverity(str) -low|medium|high|criticalbug_type(str) - One ofsyntax | logic | security | noneline_number(int) - Suspected faulty lineconfidence(float) - Confidence score in[0.0, 1.0]
Observation
CodeObservation fields:
task_id(str) - Current task iddifficulty(str) - Task difficulty (easy|medium|hard)task_description(str) - Review instructionscode_snippet(str) - PR context + unified diffpr_title(str) - Pull request titlepr_description(str) - Pull request summarychanged_files(str) - Changed file listprevious_feedback(str) - Grader feedback from latest stepreward(float) - Normalized score contribution[0.0, 1.0]done(bool) - Episode termination flag
State
CodeState: Tracks execution state
episode_id(str) - Unique identifier for the episodestep_count(int) - Number of steps takentask_id(str) - Active task iddifficulty(str) - Active task difficultylast_score(float) - Last normalized score
Built-in Tasks and Graders
The server exposes:
GET /tasksto list all benchmark tasks.GET /grader?task_id=<id>&episode_id=<id>to read final normalized score.
Shipped tasks:
task_easy_1(logic)task_medium_1(security)task_hard_1(logic/performance-concurrency)
Rewards are strict (0, 1) with partial progress:
- file path localization
- issue type / bug type correctness
- severity calibration
- line-level precision
- evidence quality in review text
Advanced Usage
Connecting to an Existing Server
If you already have a Coding environment server running, you can connect directly:
from coding_env import CodeAction, CodingEnv
# Async usage
async with CodingEnv(base_url="http://localhost:8000") as client:
result = await client.reset()
result = await client.step(CodeAction(code="print('Hello!')"))
# Sync usage
with CodingEnv(base_url="http://localhost:8000").sync() as client:
result = client.reset()
result = client.step(CodeAction(code="print('Hello!')"))
Note: When connecting to an existing server, closing the client will NOT stop the server.
Development & Testing
Running Tests
Install the coding_env package with dev dependencies and run the tests from the repo root:
# Install coding_env with dev dependencies (includes smolagents and pytest)
uv pip install -e "envs/coding_env[dev]"
# Run unit tests (no Docker required)
uv run pytest tests/envs/test_python_codeact_reset.py tests/envs/test_python_codeact_rewards.py -v
# Run integration tests (requires Docker image to be built)
docker build -t coding-env:latest -f envs/coding_env/server/Dockerfile .
SKIP_DOCKER_TESTS=0 uv run pytest tests/envs/test_coding_env_integration.py -v
Running the Full Example
Run the complete example that demonstrates the full workflow:
python3 envs/coding_env/client/example_usage.py
This example shows:
- Creating an environment from a Docker image
- Resetting and executing code through the environment
- Automatic cleanup with
close()
Project Structure
coding_env/
βββ README.md # This file
βββ models.py # Action, Observation, and State models
βββ client/
β βββ coding_env_client.py # CodingEnv client implementation
β βββ example_usage.py # Usage examples
βββ server/
βββ python_codeact_env.py # Core environment logic
βββ app.py # FastAPI application
βββ transforms.py # Observation transforms
βββ Dockerfile # Container image definition
βββ README.md # Server-specific documentation