SimranShaikh commited on
Commit
85750fc
Β·
verified Β·
1 Parent(s): 069f38a
Files changed (1) hide show
  1. README.md +10 -150
README.md CHANGED
@@ -1,151 +1,11 @@
1
- # πŸ” CodeReview-Env
2
-
3
- An [OpenEnv](https://github.com/huggingface/openenv) reinforcement-learning environment where AI agents learn to **review source code** β€” detecting syntax errors, logic bugs, and security vulnerabilities β€” and suggest working fixes.
4
-
5
- > **Why this matters:** Code review is one of the most time-intensive tasks in software engineering. Training agents to do it reliably could save thousands of engineer-hours per year. This environment provides a rigorous, programmatically-graded benchmark for that exact capability.
6
-
7
  ---
8
-
9
- ## 🎯 Tasks
10
-
11
- | # | ID | Name | Difficulty | Max Steps | Description |
12
- |---|----|----|---|---|---|
13
- | 1 | `easy_syntax` | Python Syntax Error Detection | Easy | 5 | Find and fix a missing colon in an if-statement |
14
- | 2 | `medium_logic` | Off-by-One in Palindrome Check | Medium | 8 | Debug a subtle index error; fix is validated by 5 test cases |
15
- | 3 | `hard_security` | SQL Injection, Path Traversal & Weak Hashing | Hard | 10 | Full security audit β€” find and fix 3 distinct CVE-class vulnerabilities |
16
-
17
- ---
18
-
19
- ## πŸ•Ή Action & Observation Space
20
-
21
- ### Observation
22
- ```json
23
- {
24
- "task_id": "easy_syntax",
25
- "task_name": "Python Syntax Error Detection",
26
- "difficulty": "easy",
27
- "language": "python",
28
- "code_snippet": "...",
29
- "context": "What the code is supposed to do",
30
- "step_number": 1,
31
- "max_steps": 5,
32
- "previous_feedback": null
33
- }
34
- ```
35
-
36
- ### Action
37
- ```json
38
- {
39
- "identified_issues": [
40
- {
41
- "line_number": 2,
42
- "issue_type": "syntax_error",
43
- "description": "Missing colon at end of if-statement",
44
- "severity": "high"
45
- }
46
- ],
47
- "suggested_fix": "def calculate_discount(price, discount_percent):\n if discount_percent > 100:\n ...",
48
- "explanation": "Line 2 is missing a colon after the if condition.",
49
- "done": true
50
- }
51
- ```
52
-
53
- ### issue_type values
54
- `syntax_error` | `logic_bug` | `security_vulnerability` | `performance` | `style`
55
-
56
- ---
57
-
58
- ## πŸ† Scoring
59
-
60
- All graders are **deterministic** and return a score in `[0.0, 1.0]`.
61
-
62
- | Task | Rubric |
63
- |---|---|
64
- | Easy | 0.35 correct type + 0.35 description keywords + 0.30 fix correctness |
65
- | Medium | 0.25 correct type + 0.25 description + 0.50 fix passes 5 test cases |
66
- | Hard | Per-vulnerability: 0.45 flagged + 0.30 description + 0.25 fix; +0.05 bonus for all 3 |
67
-
68
- ### Baseline Scores (GPT-4o-mini)
69
- | Task | Score |
70
- |---|---|
71
- | easy_syntax | ~0.75 |
72
- | medium_logic | ~0.60 |
73
- | hard_security | ~0.55 |
74
-
75
- ---
76
-
77
- ## πŸš€ Setup & Usage
78
-
79
- ### Local (Docker)
80
- ```bash
81
- # Build
82
- docker build -t code-review-env .
83
-
84
- # Run server
85
- docker run -p 7860:7860 code-review-env
86
-
87
- # In another terminal β€” quick smoke test
88
- curl -X POST http://localhost:7860/reset?task_id=easy_syntax
89
- curl -X POST http://localhost:7860/step \
90
- -H "Content-Type: application/json" \
91
- -d '{"identified_issues":[{"issue_type":"syntax_error","description":"missing colon"}],"done":true}'
92
- ```
93
-
94
- ### Run baseline agent
95
- ```bash
96
- pip install -r requirements.txt
97
-
98
- export API_BASE_URL=https://api.openai.com/v1
99
- export MODEL_NAME=gpt-4o-mini
100
- export HF_TOKEN=<your-key>
101
- export SPACE_URL=http://localhost:7860
102
-
103
- python inference.py
104
- ```
105
-
106
- ### API endpoints
107
- | Method | Path | Description |
108
- |---|---|---|
109
- | GET | `/health` | Liveness probe |
110
- | GET | `/tasks` | List all tasks |
111
- | POST | `/reset?task_id=...` | Start new episode |
112
- | POST | `/step` | Submit action, get reward |
113
- | GET | `/state` | Current episode state |
114
-
115
- ---
116
-
117
- ## πŸ— Project Structure
118
-
119
- ```
120
- code-review-env/
121
- β”œβ”€β”€ app.py # FastAPI server (HF Space entrypoint)
122
- β”œβ”€β”€ inference.py # Baseline inference script
123
- β”œβ”€β”€ Dockerfile
124
- β”œβ”€β”€ openenv.yaml
125
- β”œβ”€β”€ requirements.txt
126
- β”œβ”€β”€ README.md
127
- └── environment/
128
- β”œβ”€β”€ __init__.py
129
- β”œβ”€β”€ env.py # Episode logic & state management
130
- β”œβ”€β”€ models.py # Pydantic typed models
131
- β”œβ”€β”€ tasks.py # Task definitions & ground truth
132
- └── graders.py # Deterministic graders
133
- ```
134
-
135
- ---
136
-
137
- ## πŸ“Š Reward Design
138
-
139
- - **Partial credit at every step** β€” agents get signal even with incomplete answers
140
- - **Progressive difficulty** β€” curriculum learning from easy β†’ hard is natural
141
- - **Fix execution** β€” medium grader actually *runs* the suggested fix against test cases
142
- - **Multi-vulnerability** β€” hard grader rewards each vulnerability independently, so partial detection still scores
143
-
144
- ---
145
-
146
- ## βš™οΈ Infrastructure
147
-
148
- - Runs on 2 vCPU / 8 GB RAM (well within limits)
149
- - Inference runtime < 5 min for all 3 tasks
150
- - Dockerfile uses multi-stage build for minimal image size
151
- - Environment variables: `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`, `SPACE_URL`
 
 
 
 
 
 
 
1
  ---
2
+ title: Code Review Env
3
+ emoji: πŸ”
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ app_port: 7860
8
+ tags:
9
+ - openenv
10
+ pinned: false
11
+ ---