Spaces:

ncncomplete
/

code-review-env

Sleeping

App Files Files Community

ncncomplete commited on Mar 29

Commit

19bef03

verified ·

1 Parent(s): 54bff2c

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

README.md +53 -224

README.md CHANGED Viewed

@@ -1,255 +1,84 @@
 ---
-title: Code Review Env Environment Server
 emoji: 🎯
 colorFrom: pink
 colorTo: pink
 sdk: docker
 pinned: false
 app_port: 8000
-base_path: /web
 tags:
   - openenv
 ---
-# Code Review Env Environment
-A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
-## Quick Start
-The simplest way to use the Code Review Env environment is through the `CodeReviewEnv` class:
-```python
-from code_review_env import CodeReviewAction, CodeReviewEnv
-try:
-    # Create environment from Docker image
-    code_review_envenv = CodeReviewEnv.from_docker_image("code_review_env-env:latest")
-    # Reset
-    result = code_review_envenv.reset()
-    print(f"Reset: {result.observation.echoed_message}")
-    # Send multiple messages
-    messages = ["Hello, World!", "Testing echo", "Final message"]
-    for msg in messages:
-        result = code_review_envenv.step(CodeReviewAction(message=msg))
-        print(f"Sent: '{msg}'")
-        print(f"  → Echoed: '{result.observation.echoed_message}'")
-        print(f"  → Length: {result.observation.message_length}")
-        print(f"  → Reward: {result.reward}")
-finally:
-    # Always clean up
-    code_review_envenv.close()
-```
-That's it! The `CodeReviewEnv.from_docker_image()` method handles:
-- Starting the Docker container
-- Waiting for the server to be ready
-- Connecting to the environment
-- Container cleanup when you call `close()`
-## Building the Docker Image
-Before using the environment, you need to build the Docker image:
-```bash
-# From project root
-docker build -t code_review_env-env:latest -f server/Dockerfile .
-```
-## Deploying to Hugging Face Spaces
-You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
-```bash
-# From the environment directory (where openenv.yaml is located)
-openenv push
-# Or specify options
-openenv push --namespace my-org --private
-```
-The `openenv push` command will:
-1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
-2. Prepare a custom build for Hugging Face Docker space (enables web interface)
-3. Upload to Hugging Face (ensuring you're logged in)
-### Prerequisites
-- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
-### Options
-- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
-- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
-- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
-- `--private`: Deploy the space as private (default: public)
-### Examples
-```bash
-# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
-openenv push
-# Push to a specific repository
-openenv push --repo-id my-org/my-env
-# Push with a custom base image
-openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
-# Push as a private space
-openenv push --private
-# Combine options
-openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
-```
-After deployment, your space will be available at:
-`https://huggingface.co/spaces/<repo-id>`
-The deployed space includes:
-- **Web Interface** at `/web` - Interactive UI for exploring the environment
-- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
-- **Health Check** at `/health` - Container health monitoring
-- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
-## Environment Details
-### Action
-**CodeReviewAction**: Contains a single field
-- `message` (str) - The message to echo back
-### Observation
-**CodeReviewObservation**: Contains the echo response and metadata
-- `echoed_message` (str) - The message echoed back
-- `message_length` (int) - Length of the message
-- `reward` (float) - Reward based on message length (length × 0.1)
-- `done` (bool) - Always False for echo environment
-- `metadata` (dict) - Additional info like step count
-### Reward
-The reward is calculated as: `message_length × 0.1`
-- "Hi" → reward: 0.2
-- "Hello, World!" → reward: 1.3
-- Empty message → reward: 0.0
-## Advanced Usage
-### Connecting to an Existing Server
-If you already have a Code Review Env environment server running, you can connect directly:
-```python
-from code_review_env import CodeReviewEnv
-# Connect to existing server
-code_review_envenv = CodeReviewEnv(base_url="<ENV_HTTP_URL_HERE>")
-# Use as normal
-result = code_review_envenv.reset()
-result = code_review_envenv.step(CodeReviewAction(message="Hello!"))
-```
-Note: When connecting to an existing server, `code_review_envenv.close()` will NOT stop the server.
-### Using the Context Manager
-The client supports context manager usage for automatic connection management:
-```python
-from code_review_env import CodeReviewAction, CodeReviewEnv
-# Connect with context manager (auto-connects and closes)
-with CodeReviewEnv(base_url="http://localhost:8000") as env:
-    result = env.reset()
-    print(f"Reset: {result.observation.echoed_message}")
-    # Multiple steps with low latency
-    for msg in ["Hello", "World", "!"]:
-        result = env.step(CodeReviewAction(message=msg))
-        print(f"Echoed: {result.observation.echoed_message}")
-```
-The client uses WebSocket connections for:
-- **Lower latency**: No HTTP connection overhead per request
-- **Persistent session**: Server maintains your environment state
-- **Efficient for episodes**: Better for many sequential steps
-### Concurrent WebSocket Sessions
-The server supports multiple concurrent WebSocket connections. To enable this,
-modify `server/app.py` to use factory mode:
-```python
-# In server/app.py - use factory mode for concurrent sessions
-app = create_app(
-    CodeReviewEnvironment,  # Pass class, not instance
-    CodeReviewAction,
-    CodeReviewObservation,
-    max_concurrent_envs=4,  # Allow 4 concurrent sessions
-)
-```
-Then multiple clients can connect simultaneously:
-```python
-from code_review_env import CodeReviewAction, CodeReviewEnv
-from concurrent.futures import ThreadPoolExecutor
-def run_episode(client_id: int):
-    with CodeReviewEnv(base_url="http://localhost:8000") as env:
-        result = env.reset()
-        for i in range(10):
-            result = env.step(CodeReviewAction(message=f"Client {client_id}, step {i}"))
-        return client_id, result.observation.message_length
-# Run 4 episodes concurrently
-with ThreadPoolExecutor(max_workers=4) as executor:
-    results = list(executor.map(run_episode, range(4)))
-```
-## Development & Testing
-### Direct Environment Testing
-Test the environment logic directly without starting the HTTP server:
 ```bash
-# From the server directory
-python3 server/code_review_env_environment.py
 ```
-This verifies that:
-- Environment resets correctly
-- Step executes actions properly
-- State tracking works
-- Rewards are calculated correctly
-### Running Locally
-Run the server locally for development:
-```bash
-uvicorn server.app:app --reload
-```
-## Project Structure
-```
-code_review_env/
-├── .dockerignore         # Docker build exclusions
-├── __init__.py            # Module exports
-├── README.md              # This file
-├── openenv.yaml           # OpenEnv manifest
-├── pyproject.toml         # Project metadata and dependencies
-├── uv.lock                # Locked dependencies (generated)
-├── client.py              # CodeReviewEnv client
-├── models.py              # Action and Observation models
-└── server/
-    ├── __init__.py        # Server module exports
-    ├── code_review_env_environment.py  # Core environment logic
-    ├── app.py             # FastAPI application (HTTP + WebSocket endpoints)
-    └── Dockerfile         # Container image definition
-```

 ---
+title: Code Review Environment
 emoji: 🎯
 colorFrom: pink
 colorTo: pink
 sdk: docker
 pinned: false
 app_port: 8000
 tags:
   - openenv
+base_path: /web
 ---
+# Code Review Environment
+An OpenEnv environment where an AI agent reviews Python code snippets to identify bugs across three difficulty levels.
+🤗 **Space:** https://huggingface.co/spaces/ncncomplete/code-review-env
+## Environment Description
+The agent receives a Python code snippet and must identify the bug type, line number, and provide an explanation. The environment simulates real-world code review tasks that developers perform daily.
+## Tasks
+| Task | Difficulty | Description |
+|------|-----------|-------------|
+| easy | Easy | Identify syntax/runtime errors |
+| medium | Medium | Identify logic bugs in code that runs but produces wrong output |
+| hard | Hard | Identify security vulnerabilities |
+## Action Space
+| Field | Type | Description |
+|-------|------|-------------|
+| review | str | Written analysis of the code |
+| bug_type | str | One of: syntax, logic, security, none |
+| line_number | int | Line number where bug occurs (-1 if unknown) |
+| confidence | float | Agent confidence 0.0–1.0 |
+## Observation Space
+| Field | Type | Description |
+|-------|------|-------------|
+| code_snippet | str | Python code to review |
+| task_description | str | What the agent is asked to do |
+| task_id | str | easy, medium, or hard |
+| attempt_number | int | Steps taken so far |
+| previous_feedback | str | Feedback from last step |
+| done | bool | Whether episode is complete |
+## Reward Function
+- **+1.0** correct bug type identified
+- **+0.5** correct line number identified
+- **+0.5** quality explanation (key concepts present)
+- **-0.3** wrong bug category confidently stated
+- **-0.1** per retry after first attempt
+- Normalized to 0.0–1.0 range
+## Baseline Scores
+| Task | Score |
+|------|-------|
+| easy | 1.0 |
+| medium | 1.0 |
+| hard | 1.0 |
+| **average** | **1.0** |
+## Setup
 ```bash
+pip install openenv-core fastapi uvicorn pydantic openai
+uvicorn server.app:app --host 0.0.0.0 --port 8000
 ```
+## API Endpoints
+- `POST /reset` — Start new episode with `{"task_id": "easy|medium|hard"}`
+- `POST /step` — Submit action with `{"action": {...}}`
+- `GET /state` — Get current environment state
+- `GET /tasks` — List all tasks and action schema
+- `GET /grader` — Get grader score for a task
+- `GET /baseline` — Run baseline inference on all tasks