Spaces:

SimranShaikh
/

code-review-env

Sleeping

App Files Files Community

SimranShaikh commited on 10 days ago

Commit

85750fc

verified ·

1 Parent(s): 069f38a

commit

Browse files

Files changed (1) hide show

README.md +10 -150

README.md CHANGED Viewed

@@ -1,151 +1,11 @@
-# 🔍 CodeReview-Env
-An [OpenEnv](https://github.com/huggingface/openenv) reinforcement-learning environment where AI agents learn to **review source code** — detecting syntax errors, logic bugs, and security vulnerabilities — and suggest working fixes.
-> **Why this matters:** Code review is one of the most time-intensive tasks in software engineering. Training agents to do it reliably could save thousands of engineer-hours per year. This environment provides a rigorous, programmatically-graded benchmark for that exact capability.
 ---
-## 🎯 Tasks
-| # | ID | Name | Difficulty | Max Steps | Description |
-|---|----|----|---|---|---|
-| 1 | `easy_syntax` | Python Syntax Error Detection | Easy | 5 | Find and fix a missing colon in an if-statement |
-| 2 | `medium_logic` | Off-by-One in Palindrome Check | Medium | 8 | Debug a subtle index error; fix is validated by 5 test cases |
-| 3 | `hard_security` | SQL Injection, Path Traversal & Weak Hashing | Hard | 10 | Full security audit — find and fix 3 distinct CVE-class vulnerabilities |
----
-## 🕹 Action & Observation Space
-### Observation
-```json
-{
-  "task_id": "easy_syntax",
-  "task_name": "Python Syntax Error Detection",
-  "difficulty": "easy",
-  "language": "python",
-  "code_snippet": "...",
-  "context": "What the code is supposed to do",
-  "step_number": 1,
-  "max_steps": 5,
-  "previous_feedback": null
-}
-```
-### Action
-```json
-{
-  "identified_issues": [
-    {
-      "line_number": 2,
-      "issue_type": "syntax_error",
-      "description": "Missing colon at end of if-statement",
-      "severity": "high"
-    }
-  ],
-  "suggested_fix": "def calculate_discount(price, discount_percent):\n    if discount_percent > 100:\n        ...",
-  "explanation": "Line 2 is missing a colon after the if condition.",
-  "done": true
-}
-```
-### issue_type values
-`syntax_error` | `logic_bug` | `security_vulnerability` | `performance` | `style`
----
-## 🏆 Scoring
-All graders are **deterministic** and return a score in `[0.0, 1.0]`.
-| Task | Rubric |
-|---|---|
-| Easy | 0.35 correct type + 0.35 description keywords + 0.30 fix correctness |
-| Medium | 0.25 correct type + 0.25 description + 0.50 fix passes 5 test cases |
-| Hard | Per-vulnerability: 0.45 flagged + 0.30 description + 0.25 fix; +0.05 bonus for all 3 |
-### Baseline Scores (GPT-4o-mini)
-| Task | Score |
-|---|---|
-| easy_syntax | ~0.75 |
-| medium_logic | ~0.60 |
-| hard_security | ~0.55 |
----
-## 🚀 Setup & Usage
-### Local (Docker)
-```bash
-# Build
-docker build -t code-review-env .
-# Run server
-docker run -p 7860:7860 code-review-env
-# In another terminal — quick smoke test
-curl -X POST http://localhost:7860/reset?task_id=easy_syntax
-curl -X POST http://localhost:7860/step \
-  -H "Content-Type: application/json" \
-  -d '{"identified_issues":[{"issue_type":"syntax_error","description":"missing colon"}],"done":true}'
-```
-### Run baseline agent
-```bash
-pip install -r requirements.txt
-export API_BASE_URL=https://api.openai.com/v1
-export MODEL_NAME=gpt-4o-mini
-export HF_TOKEN=<your-key>
-export SPACE_URL=http://localhost:7860
-python inference.py
-```
-### API endpoints
-| Method | Path | Description |
-|---|---|---|
-| GET | `/health` | Liveness probe |
-| GET | `/tasks` | List all tasks |
-| POST | `/reset?task_id=...` | Start new episode |
-| POST | `/step` | Submit action, get reward |
-| GET | `/state` | Current episode state |
----
-## 🏗 Project Structure
-```
-code-review-env/
-├── app.py                  # FastAPI server (HF Space entrypoint)
-├── inference.py            # Baseline inference script
-├── Dockerfile
-├── openenv.yaml
-├── requirements.txt
-├── README.md
-└── environment/
-    ├── __init__.py
-    ├── env.py              # Episode logic & state management
-    ├── models.py           # Pydantic typed models
-    ├── tasks.py            # Task definitions & ground truth
-    └── graders.py          # Deterministic graders
-```
----
-## 📊 Reward Design
-- **Partial credit at every step** — agents get signal even with incomplete answers
-- **Progressive difficulty** — curriculum learning from easy → hard is natural
-- **Fix execution** — medium grader actually *runs* the suggested fix against test cases
-- **Multi-vulnerability** — hard grader rewards each vulnerability independently, so partial detection still scores
----
-## ⚙️ Infrastructure
-- Runs on 2 vCPU / 8 GB RAM (well within limits)
-- Inference runtime < 5 min for all 3 tasks
-- Dockerfile uses multi-stage build for minimal image size
-- Environment variables: `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`, `SPACE_URL`

 ---
+title: Code Review Env
+emoji: 🔍
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 7860
+tags:
+  - openenv
+pinned: false
+---