ncncomplete commited on
Commit
19bef03
Β·
verified Β·
1 Parent(s): 54bff2c

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +53 -224
README.md CHANGED
@@ -1,255 +1,84 @@
1
  ---
2
- title: Code Review Env Environment Server
3
  emoji: 🎯
4
  colorFrom: pink
5
  colorTo: pink
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
9
- base_path: /web
10
  tags:
11
  - openenv
 
12
  ---
13
 
14
- # Code Review Env Environment
15
-
16
- A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
17
-
18
- ## Quick Start
19
-
20
- The simplest way to use the Code Review Env environment is through the `CodeReviewEnv` class:
21
-
22
- ```python
23
- from code_review_env import CodeReviewAction, CodeReviewEnv
24
-
25
- try:
26
- # Create environment from Docker image
27
- code_review_envenv = CodeReviewEnv.from_docker_image("code_review_env-env:latest")
28
-
29
- # Reset
30
- result = code_review_envenv.reset()
31
- print(f"Reset: {result.observation.echoed_message}")
32
-
33
- # Send multiple messages
34
- messages = ["Hello, World!", "Testing echo", "Final message"]
35
-
36
- for msg in messages:
37
- result = code_review_envenv.step(CodeReviewAction(message=msg))
38
- print(f"Sent: '{msg}'")
39
- print(f" β†’ Echoed: '{result.observation.echoed_message}'")
40
- print(f" β†’ Length: {result.observation.message_length}")
41
- print(f" β†’ Reward: {result.reward}")
42
-
43
- finally:
44
- # Always clean up
45
- code_review_envenv.close()
46
- ```
47
-
48
- That's it! The `CodeReviewEnv.from_docker_image()` method handles:
49
- - Starting the Docker container
50
- - Waiting for the server to be ready
51
- - Connecting to the environment
52
- - Container cleanup when you call `close()`
53
-
54
- ## Building the Docker Image
55
-
56
- Before using the environment, you need to build the Docker image:
57
-
58
- ```bash
59
- # From project root
60
- docker build -t code_review_env-env:latest -f server/Dockerfile .
61
- ```
62
-
63
- ## Deploying to Hugging Face Spaces
64
-
65
- You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
66
-
67
- ```bash
68
- # From the environment directory (where openenv.yaml is located)
69
- openenv push
70
-
71
- # Or specify options
72
- openenv push --namespace my-org --private
73
- ```
74
-
75
- The `openenv push` command will:
76
- 1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
77
- 2. Prepare a custom build for Hugging Face Docker space (enables web interface)
78
- 3. Upload to Hugging Face (ensuring you're logged in)
79
-
80
- ### Prerequisites
81
-
82
- - Authenticate with Hugging Face: The command will prompt for login if not already authenticated
83
-
84
- ### Options
85
-
86
- - `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
87
- - `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
88
- - `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
89
- - `--private`: Deploy the space as private (default: public)
90
-
91
- ### Examples
92
-
93
- ```bash
94
- # Push to your personal namespace (defaults to username/env-name from openenv.yaml)
95
- openenv push
96
-
97
- # Push to a specific repository
98
- openenv push --repo-id my-org/my-env
99
-
100
- # Push with a custom base image
101
- openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
102
-
103
- # Push as a private space
104
- openenv push --private
105
-
106
- # Combine options
107
- openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
108
- ```
109
-
110
- After deployment, your space will be available at:
111
- `https://huggingface.co/spaces/<repo-id>`
112
-
113
- The deployed space includes:
114
- - **Web Interface** at `/web` - Interactive UI for exploring the environment
115
- - **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
116
- - **Health Check** at `/health` - Container health monitoring
117
- - **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
118
 
119
- ## Environment Details
120
 
121
- ### Action
122
- **CodeReviewAction**: Contains a single field
123
- - `message` (str) - The message to echo back
124
 
125
- ### Observation
126
- **CodeReviewObservation**: Contains the echo response and metadata
127
- - `echoed_message` (str) - The message echoed back
128
- - `message_length` (int) - Length of the message
129
- - `reward` (float) - Reward based on message length (length Γ— 0.1)
130
- - `done` (bool) - Always False for echo environment
131
- - `metadata` (dict) - Additional info like step count
132
 
133
- ### Reward
134
- The reward is calculated as: `message_length Γ— 0.1`
135
- - "Hi" β†’ reward: 0.2
136
- - "Hello, World!" β†’ reward: 1.3
137
- - Empty message β†’ reward: 0.0
138
 
139
- ## Advanced Usage
140
 
141
- ### Connecting to an Existing Server
 
 
 
 
142
 
143
- If you already have a Code Review Env environment server running, you can connect directly:
144
 
145
- ```python
146
- from code_review_env import CodeReviewEnv
 
 
 
 
147
 
148
- # Connect to existing server
149
- code_review_envenv = CodeReviewEnv(base_url="<ENV_HTTP_URL_HERE>")
150
 
151
- # Use as normal
152
- result = code_review_envenv.reset()
153
- result = code_review_envenv.step(CodeReviewAction(message="Hello!"))
154
- ```
155
-
156
- Note: When connecting to an existing server, `code_review_envenv.close()` will NOT stop the server.
157
-
158
- ### Using the Context Manager
159
-
160
- The client supports context manager usage for automatic connection management:
161
 
162
- ```python
163
- from code_review_env import CodeReviewAction, CodeReviewEnv
164
 
165
- # Connect with context manager (auto-connects and closes)
166
- with CodeReviewEnv(base_url="http://localhost:8000") as env:
167
- result = env.reset()
168
- print(f"Reset: {result.observation.echoed_message}")
169
- # Multiple steps with low latency
170
- for msg in ["Hello", "World", "!"]:
171
- result = env.step(CodeReviewAction(message=msg))
172
- print(f"Echoed: {result.observation.echoed_message}")
173
- ```
174
-
175
- The client uses WebSocket connections for:
176
- - **Lower latency**: No HTTP connection overhead per request
177
- - **Persistent session**: Server maintains your environment state
178
- - **Efficient for episodes**: Better for many sequential steps
179
 
180
- ### Concurrent WebSocket Sessions
181
 
182
- The server supports multiple concurrent WebSocket connections. To enable this,
183
- modify `server/app.py` to use factory mode:
184
-
185
- ```python
186
- # In server/app.py - use factory mode for concurrent sessions
187
- app = create_app(
188
- CodeReviewEnvironment, # Pass class, not instance
189
- CodeReviewAction,
190
- CodeReviewObservation,
191
- max_concurrent_envs=4, # Allow 4 concurrent sessions
192
- )
193
- ```
194
 
195
- Then multiple clients can connect simultaneously:
196
-
197
- ```python
198
- from code_review_env import CodeReviewAction, CodeReviewEnv
199
- from concurrent.futures import ThreadPoolExecutor
200
-
201
- def run_episode(client_id: int):
202
- with CodeReviewEnv(base_url="http://localhost:8000") as env:
203
- result = env.reset()
204
- for i in range(10):
205
- result = env.step(CodeReviewAction(message=f"Client {client_id}, step {i}"))
206
- return client_id, result.observation.message_length
207
-
208
- # Run 4 episodes concurrently
209
- with ThreadPoolExecutor(max_workers=4) as executor:
210
- results = list(executor.map(run_episode, range(4)))
211
- ```
212
-
213
- ## Development & Testing
214
-
215
- ### Direct Environment Testing
216
-
217
- Test the environment logic directly without starting the HTTP server:
218
 
219
  ```bash
220
- # From the server directory
221
- python3 server/code_review_env_environment.py
222
  ```
223
 
224
- This verifies that:
225
- - Environment resets correctly
226
- - Step executes actions properly
227
- - State tracking works
228
- - Rewards are calculated correctly
229
-
230
- ### Running Locally
231
 
232
- Run the server locally for development:
233
-
234
- ```bash
235
- uvicorn server.app:app --reload
236
- ```
237
-
238
- ## Project Structure
239
-
240
- ```
241
- code_review_env/
242
- β”œβ”€β”€ .dockerignore # Docker build exclusions
243
- β”œβ”€β”€ __init__.py # Module exports
244
- β”œβ”€β”€ README.md # This file
245
- β”œβ”€β”€ openenv.yaml # OpenEnv manifest
246
- β”œβ”€β”€ pyproject.toml # Project metadata and dependencies
247
- β”œβ”€β”€ uv.lock # Locked dependencies (generated)
248
- β”œβ”€β”€ client.py # CodeReviewEnv client
249
- β”œβ”€β”€ models.py # Action and Observation models
250
- └── server/
251
- β”œβ”€β”€ __init__.py # Server module exports
252
- β”œβ”€β”€ code_review_env_environment.py # Core environment logic
253
- β”œβ”€β”€ app.py # FastAPI application (HTTP + WebSocket endpoints)
254
- └── Dockerfile # Container image definition
255
- ```
 
1
  ---
2
+ title: Code Review Environment
3
  emoji: 🎯
4
  colorFrom: pink
5
  colorTo: pink
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
 
9
  tags:
10
  - openenv
11
+ base_path: /web
12
  ---
13
 
14
+ # Code Review Environment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
+ An OpenEnv environment where an AI agent reviews Python code snippets to identify bugs across three difficulty levels.
17
 
18
+ πŸ€— **Space:** https://huggingface.co/spaces/ncncomplete/code-review-env
 
 
19
 
20
+ ## Environment Description
 
 
 
 
 
 
21
 
22
+ The agent receives a Python code snippet and must identify the bug type, line number, and provide an explanation. The environment simulates real-world code review tasks that developers perform daily.
 
 
 
 
23
 
24
+ ## Tasks
25
 
26
+ | Task | Difficulty | Description |
27
+ |------|-----------|-------------|
28
+ | easy | Easy | Identify syntax/runtime errors |
29
+ | medium | Medium | Identify logic bugs in code that runs but produces wrong output |
30
+ | hard | Hard | Identify security vulnerabilities |
31
 
32
+ ## Action Space
33
 
34
+ | Field | Type | Description |
35
+ |-------|------|-------------|
36
+ | review | str | Written analysis of the code |
37
+ | bug_type | str | One of: syntax, logic, security, none |
38
+ | line_number | int | Line number where bug occurs (-1 if unknown) |
39
+ | confidence | float | Agent confidence 0.0–1.0 |
40
 
41
+ ## Observation Space
 
42
 
43
+ | Field | Type | Description |
44
+ |-------|------|-------------|
45
+ | code_snippet | str | Python code to review |
46
+ | task_description | str | What the agent is asked to do |
47
+ | task_id | str | easy, medium, or hard |
48
+ | attempt_number | int | Steps taken so far |
49
+ | previous_feedback | str | Feedback from last step |
50
+ | done | bool | Whether episode is complete |
 
 
51
 
52
+ ## Reward Function
 
53
 
54
+ - **+1.0** correct bug type identified
55
+ - **+0.5** correct line number identified
56
+ - **+0.5** quality explanation (key concepts present)
57
+ - **-0.3** wrong bug category confidently stated
58
+ - **-0.1** per retry after first attempt
59
+ - Normalized to 0.0–1.0 range
 
 
 
 
 
 
 
 
60
 
61
+ ## Baseline Scores
62
 
63
+ | Task | Score |
64
+ |------|-------|
65
+ | easy | 1.0 |
66
+ | medium | 1.0 |
67
+ | hard | 1.0 |
68
+ | **average** | **1.0** |
 
 
 
 
 
 
69
 
70
+ ## Setup
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ```bash
73
+ pip install openenv-core fastapi uvicorn pydantic openai
74
+ uvicorn server.app:app --host 0.0.0.0 --port 8000
75
  ```
76
 
77
+ ## API Endpoints
 
 
 
 
 
 
78
 
79
+ - `POST /reset` β€” Start new episode with `{"task_id": "easy|medium|hard"}`
80
+ - `POST /step` β€” Submit action with `{"action": {...}}`
81
+ - `GET /state` β€” Get current environment state
82
+ - `GET /tasks` β€” List all tasks and action schema
83
+ - `GET /grader` β€” Get grader score for a task
84
+ - `GET /baseline` β€” Run baseline inference on all tasks