| |
| |
|
|
| |
| name: code-security-review |
| version: "1.0.0" |
| description: > |
| An RL environment for training AI agents to perform code security review. |
| Agents analyze code snippets from production pull requests and identify bugs, |
| vulnerabilities, and security issues. |
| author: Inmodel Labs |
|
|
| |
| |
| tasks: |
| - id: python-off-by-one |
| name: "Python Off-by-One Error" |
| description: "Identify an off-by-one index error in a Python finance batch processor" |
| difficulty: easy |
| max_steps: 2 |
| reward_range: [0.0, 1.0] |
|
|
| - id: js-idor-auth |
| name: "JavaScript IDOR Authorization Bypass" |
| description: "Identify a horizontal privilege escalation (IDOR) in a Node.js REST profile endpoint" |
| difficulty: medium |
| max_steps: 2 |
| reward_range: [0.0, 1.0] |
|
|
| - id: python-pickle-deserialization |
| name: "Python Pickle Deserialization" |
| description: "Identify an insecure deserialization vulnerability using pickle in a background worker" |
| difficulty: hard |
| max_steps: 2 |
| reward_range: [0.0, 1.0] |
|
|
| |
| |
| action_space: |
| type: object |
| description: > |
| Two-phase action space. Phase 1: submit {"request_file": true} to unlock |
| the code snippet (+0.20 reward). Phase 2: submit a full review JSON. |
| properties: |
| request_file: { type: boolean, description: "Phase 1: Request the hidden file contents" } |
| bug_identified: { type: boolean, description: "Boolean: true if a bug exists" } |
| bug_location: { type: string, description: "String: Pinpoint the bug's location in code" } |
| bug_type: { type: string, description: "String: off-by-one | logic-error | insecure-deserialization | none" } |
| bug_description: { type: string, description: "String: Detailed analysis of the vulnerability" } |
| severity: { type: string, enum: [none, low, medium, high, critical], description: "String: none | low | medium | high | critical" } |
| suggested_fix: { type: string, description: "String: How to fix the identified bug" } |
|
|
| |
| |
| observation_space: |
| type: object |
| properties: |
| task_id: { type: string, description: "Unique task identifier" } |
| language: { type: string, description: "Source code language" } |
| difficulty: { type: string, enum: [easy, medium, hard], description: "Task complexity (easy/medium/hard)" } |
| code_snippet: { type: string, description: "The source code to be reviewed" } |
| context: { type: string, description: "Real-world context (e.g., API description)" } |
| pr_title: { type: string, description: "Pull Request title for additional intent context" } |
| file_path: { type: string, description: "Relative path to the file in the repository" } |
|
|
| |
| reward: |
| min: 0.0 |
| max: 1.0 |
| description: > |
| Step 1 — File request: +0.20 (flat, always granted). |
| Step 2 — Bug review: partial rewards for bug identification (0.20), |
| correct bug type (0.20), precise location (0.10), description quality (0.25, |
| keyword density), fix quality (0.15), correct severity (0.10). |
| Episode total is clamped to [0.0, 1.0]. Grader penalizes keyword stuffing. |
| |
| endpoints: |
| health: GET / |
| reset: POST /reset |
| step: POST /step |
| state: GET /state |
| tasks: GET /tasks |
|
|