Spaces:

Nitishkumar-ai
/

commitguard-env

Sleeping

App Files Files Community

Nitishkumar-ai commited on 3 days ago

Commit

95cbc5b

0 Parent(s):

Deployment Build (Final): Professional Structure + Blog

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.agent/FUTURE_WORK.md +16 -0
.agent/README.md +38 -0
.agent/agent_instructions.md +69 -0
.agent/architecture.md +149 -0
.agent/checkpoints.md +57 -0
.agent/coding_conventions.md +63 -0
.agent/decision_log.md +40 -0
.agent/git_workflow.md +85 -0
.agent/project_context.md +82 -0
.agent/test_contracts.md +48 -0
.claude/settings.local.json +12 -0
.dockerignore +17 -0
.gitignore +36 -0
.pre-commit-hooks.yaml +7 -0
.vscode/settings.json +10 -0
.vscode/tasks.json +16 -0
Dockerfile +17 -0
Dockerfile.train +56 -0
GEMINI.md +55 -0
README.md +188 -0
README_SUBMISSION.md +64 -0
__init__.py +0 -0
action.yml +34 -0
commitguard_env/__init__.py +8 -0
commitguard_env/agent_prompt.py +68 -0
commitguard_env/cli.py +131 -0
commitguard_env/environment.py +173 -0
commitguard_env/grpo_prompt.py +38 -0
commitguard_env/hooks.py +50 -0
commitguard_env/inference.py +86 -0
commitguard_env/models.py +70 -0
commitguard_env/parse_action.py +97 -0
commitguard_env/reward.py +100 -0
commitguard_env/scanner.py +54 -0
commitguard_env/server.py +127 -0
configs/openenv.yaml +4 -0
data/cwe_keywords.json +11 -0
data/devign_filtered.jsonl +0 -0
data/devign_test.jsonl +0 -0
data/devign_train.jsonl +0 -0
docs/deployment.md +173 -0
docs/hybrid_workflow.md +108 -0
docs/prd.md +381 -0
docs/testprojects.md +80 -0
docs/usecase.md +60 -0
docs/vulnerabilities.md +90 -0
gitlab-ci-template.yml +16 -0
notebooks/train_commitguard.ipynb +604 -0
pyproject.toml +48 -0
pyrightconfig.json +16 -0

.agent/FUTURE_WORK.md ADDED Viewed

	@@ -0,0 +1,16 @@

+<!--
+If an agent is tempted to build something not in the current scope, append it here instead and continue with the locked task.
+Source: ../prd.md 14 (Future Work). Do not execute these during the hackathon build unless explicitly re-scoped by the whole team (and documented).
+-->
+## Future Work (post-hackathon)
+- **Sandboxed exploit execution**  replace pattern-match reward with actual exploit runs against compiled code in a Docker sandbox
+- **Multi-file commit reasoning**  extend the env to support diffs spanning multiple files, with a context budget
+- **Self-play loop**  pair CommitGuard with a code-generation agent; defender and attacker train against each other (the AlphaGo pattern for security)
+- **Agentic harness integration**  wire into real CI pipelines via the OpenEnv MCP layer, enabling commit-time security review at PR open
+- **Real CVE corpus**  extend beyond Devign to recent CVE-tagged commits from major open-source repos
+- **Multi-language support**  current env is C-focused via Devign; extend to Python, JavaScript, Go
+- **Reward shape ablations**  formal study of how reward composition affects which vulnerability types the model learns fastest

.agent/README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+## What this folder is
+`.agent/` is the **operating system for AI agents** on this repo. It locks the architecture decisions from `../prd.md`, prevents scope creep under deadline pressure, and makes sure three engineers can use Cursor / Claude Code in parallel without drifting.
+If you're an agent: **load `project_context.md` first**. If you're a human: treat this folder like the team's constitution.
+## Nonnegotiable rule (scope freeze)
+**Scope freeze is midnight Saturday (00:00 IST).** After that time:
+- Do not add features, endpoints, model changes, UI, or nice to haves.
+- Only do bug fixes, tests, wiring, docs, and reliability work that protects the locked deliverables.
+- If youre tempted to add something: append it to `FUTURE_WORK.md` and continue the locked task.
+## Files and what each enforces
+- `project_context.md`: **Single source of truth**. The compressed PRD: what were building, why, who for, locked stack, 30sec pitch, nongoals.
+- `architecture.md`: **Technical contract**. File layout, dataclass schemas, XML action format, reward signature, observation schema, cheating prevention, required HTTP endpoints.
+- `coding_conventions.md`: **How we write code**. Typed dataclasses, import order, errors, forbidden patterns, repo hygiene.
+- `decision_log.md`: **Locked decisions + fallbacks**. PRD 7.1 in table form, PRD 7.2 fallback triggers. New decisions go here with timestamp+author.
+- `agent_instructions.md`: **System prompt** for any coding agent. Read order, refusal rules, time pressure behavior, fallback triggers.
+- `checkpoints.md`: **Team sync contract** at midnight / 9 AM / 3 PM. What must be demoable; what triggers scope cuts; what gets cut first.
+- `test_contracts.md`: **Blocking tests** required before merge: no-leak, reward cases, XML parser robustness, env smoke.
+- `git_workflow.md`: **Parallel work rules**. Branch naming, commit conventions, merge gates, no-force-push rules, pre-submission checklist.
+- `FUTURE_WORK.md`: **Parking lot** for anything not in current scope (pre-populated from PRD 14).
+## Where the real spec lives
+The authoritative PRD is `../prd.md`. If any `.agent/` file disagrees with the PRD, **the PRD wins** and you must update the `.agent/` file immediately.
+## Task files (per person)
+This repo expects per-person task lists:
+- `../tasks_niti.md`
+- `../tasks_deepak.md`
+- `../tasks_divyank.md`
+If they dont exist yet, create them now with 1020 bullet tasks each and keep them updated. Agents should read the relevant one **after** `project_context.md` and `architecture.md`.

.agent/agent_instructions.md ADDED Viewed

	@@ -0,0 +1,69 @@

+## System prompt for CommitGuard coding agents
+You are an AI coding agent working on the **CommitGuard** hackathon repo.
+Your job is to ship the locked deliverables before **Sunday 5:00 PM IST** with minimal risk. This is a **deadline game**, not a feature game.
+### Read order (mandatory)
+1. Read `.agent/project_context.md` (single source of truth).
+2. Read `.agent/architecture.md` (technical contract).
+3. Read `.agent/coding_conventions.md` (how we write code).
+4. Read the relevant task list:
+   - `tasks_niti.md` OR `tasks_deepak.md` OR `tasks_divyank.md`
+   - If missing: create it with concrete bullets and continue.
+Only then start coding.
+### Scope control (hard refusal rule)
+**Scope freeze is midnight Saturday (00:00 IST).** After that:
+- Refuse any scope expansion, new features, new endpoints, new UI, new metrics.
+- Only do: bug fixes, tests, wiring, packaging, docs, reliability.
+If asked to add a feature:
+- Do **not** implement it.
+- Append it to `.agent/FUTURE_WORK.md` with 1-line rationale.
+- Continue the locked task.
+### Architectural choices (dont guess)
+If a decision is not covered by `.agent/architecture.md`:
+- Ask for clarification (or check `../prd.md`).
+- Do not invent new schemas or endpoints because it seems right.
+### Cheating prevention (highest priority constraint)
+The environment is RLVR: reward comes from dataset ground truth, but the agent must never see labels.
+Rules:
+- Observations must never contain ground truth (`is_vulnerable`, `cwe`, labels, this is vulnerable strings).
+- The server must never return label fields in HTTP responses.
+- Debug endpoints must never include ground truth.
+- Always keep `test_no_leak.py` green.
+### Time-pressure behavior (what good looks like)
+Under deadline pressure:
+- Prefer the simplest implementation that passes the contracts in `.agent/test_contracts.md`.
+- Treat the fallbacks in `.agent/project_context.md` as pre-approved pivots; if triggered, pivot immediately and log in `.agent/decision_log.md`.
+- Avoid refactors unless they remove a clear blocker.
+### Fallback triggers (execute immediately)
+If any trigger happens, switch to the fallback with no debate:
+- OOM on A10G  Qwen2.5-1.5B-Instruct
+- HF Jobs queue >30 min  GCP A10G on-demand
+- 3-action env not shipped by midnight  2-action env
+- Tiered reward buggy  binary reward only
+- Curve flat at 10 AM Sunday  qualitative narrative
+- Video recording fails twice  text trace in README
+### CLI-first ops (HF + GCP)
+Prefer repeatable CLI commands over UI clicks:
+- HF Space + repos: use `huggingface-cli` / git
+- GCP: use `gcloud`
+Document any required commands in `README.md` or `scripts/`.

.agent/architecture.md ADDED Viewed

	@@ -0,0 +1,149 @@

+## Architecture contract (do not improvise)
+This is the technical contract for CommitGuard. If youre about to invent a new shape, dont. Either its already here, or it belongs in `FUTURE_WORK.md`.
+Authoritative source: `../prd.md` (58).
+## Repo layout (locked)
+Target layout (names are contracts; adjust only if repo already differs):
+- `commitguard_env/`
+  - `models.py`  typed dataclasses: `Action`, `Observation`, `EnvState`, `GroundTruth`
+  - `parse_action.py`  XML action parser (robust to malformed output)
+  - `reward.py`  `compute_reward(...) -> float` (pure function)
+  - `environment.py`  `CommitGuardEnvironment` implementing OpenEnv reset/step/state
+  - `server.py`  FastAPI app exposing OpenEnv HTTP endpoints
+- `data/`
+  - `devign_filtered.jsonl`  dataset embedded in Docker image
+  - `cwe_keywords.json`  top-10 CWE  keyword map (for exploit sketch bonus)
+- `tests/`  blocking tests listed in `test_contracts.md`
+- `scripts/`  dataset preprocessing and ops scripts (CLI-first)
+- `README.md`  story + links + how to run
+If the codebase already has a different structure, keep the same semantics and update this file to match.
+## Dataclass schemas (typed; no untyped dicts in public APIs)
+All public shapes are typed dataclasses. Internal parsing may use dicts, but boundaries must be dataclasses.
+### `Action`
+- **Raw input**: `raw_action: str` (the model output)
+- **Parsed**:
+  - `action_type: Literal["request_context", "analyze", "verdict"]`
+  - `fields: ActionFields` (typed union by action_type)
+### `Observation` (cheating-prevention critical)
+Must include only:
+- `episode_id: str`
+- `step_idx: int`
+- `diff: str` (code_before/code_after diff or unified diff string)
+- `repo_files: list[str]` (or `available_files`)
+- `context_snippets: list[ContextSnippet]` (only if requested)
+- `budget_remaining: int`
+- `error: str | None` (for malformed actions, etc.)
+Must **never** include:
+- `is_vulnerable`, `label`, `ground_truth`, `cwe_type`, `target_file_with_label`
+- anything that trivially implies the label (e.g., this sample is vulnerable)
+### `GroundTruth` (server-only)
+Lives only on the server. Never serialized into observations.
+- `is_vulnerable: bool`
+- `cwe: str | None`
+- `target_file: str`
+- `exploit_keywords: list[str]` (or derived via CWE map)
+## Cheating-prevention rule (non-negotiable)
+**Observation must never contain ground truth.** Reward is the only scalar feedback; it must not leak label via strings or metadata.
+Enforcement:
+- observation schema excludes forbidden fields
+- `tests/test_no_leak.py` asserts forbidden keys and suspicious strings never appear
+- server returns reward as a float only; never returns label/cwe for debugging
+## Episode contract
+- Max **5 steps** per episode.
+- Episode ends when `verdict` is received OR budget hits zero.
+- `request_context` consumes budget and has per-step penalty.
+- `analyze` is allowed, logged, and should not affect reward directly.
+## Reward function (signature + invariants)
+Reward is RLVR: computed from ground truth and simple keyword checks, **not** an LLM judge.
+Signature:
+```python
+def compute_reward(
+    action: "Action",
+    ground_truth: "GroundTruth",
+    *,
+    cwe_keywords: dict[str, list[str]],
+    context_requests: int,
+) -> float: ...
+```
+Reward shape (from PRD):
+- correct vulnerable/safe: **+1.0**
+- correct CWE (when vulnerable): **+0.5**
+- plausible exploit sketch (keyword match): **+0.5**
+- false positive: **-1.0**
+- false negative: **-0.5**
+- per context request: **-0.05**
+- malformed action: penalize (recommended **-0.5**) but do not crash
+## XML action format (the model output contract)
+Model outputs exactly one top-level `<action>` block. Parser must tolerate:
+- extra whitespace
+- missing fields (treated as malformed)
+- wrong casing (normalize)
+- stray text before/after tags
+- malformed XML (best-effort extraction; never crash)
+### Spec
+Top-level:
+- `<action>`
+  - `<action_type>request_context|analyze|verdict</action_type>`
+  - `<fields>...</fields>`
+- `</action>`
+Fields by type:
+**request_context**
+- `<file_path>path/in/repo.ext</file_path>`
+- optional: `<start_line>int</start_line>`, `<end_line>int</end_line>`
+**analyze**
+- `<reasoning>free text</reasoning>`
+**verdict**
+- `<is_vulnerable>true|false</is_vulnerable>`
+- `<vuln_type>CWE-79|CWE-89|...|NONE</vuln_type>`
+- `<exploit_sketch>free text</exploit_sketch>`
+Parsing rules:
+- if `action_type` missing/invalid  malformed
+- booleans accept `true/false/1/0/yes/no` (case-insensitive)
+- `vuln_type` normalized; if safe verdict, allow `NONE`
+- on malformed: return a safe `Action` with `action_type="analyze"` and `error` set, and apply malformed penalty
+## Env server HTTP endpoints (P0)
+The env server must expose these endpoints (names from PRD 8.1):
+- `GET /health`  200 OK and simple JSON payload
+- `POST /reset`  returns initial `Observation` (+ episode id)
+- `POST /step`  accepts raw action string, returns `{observation, reward, done, info}`
+- `GET /state`  returns minimal server/env state for debugging (no ground truth)
+- `GET /docs`  FastAPI OpenAPI docs (automatic)
+Do not add new endpoints after scope freeze unless required for reliability.

.agent/checkpoints.md ADDED Viewed

	@@ -0,0 +1,57 @@

+## Checkpoints (sync-or-die contract)
+Goal: keep three engineers aligned and prevent cool demo scope creep from killing the submission. Source: `../prd.md` 12.
+### Checkpoint 1  Midnight (00:00 IST)  scope freeze + Phase 1 gate
+**Everyone must demonstrate (live, locally or on Space):**
+- **Env server runs** and responds to `GET /health`
+- **OpenEnv loop works**: `reset`  `step`  done, without crashing
+- **Action parser is robust**: malformed XML doesnt crash; returns safe error
+- **No-leak invariant**: observation contains no ground truth fields
+**Role deliverables:**
+- **Env/Server owner**: endpoints exist (`/health`, `/reset`, `/step`, `/state`, `/docs`)
+- **Reward owner**: reward function wired and deterministic on handcrafted cases
+- **Training owner**: mock training loop can call env repeatedly (even if reward is dummy)
+**If any of these are red, trigger a scope cut immediately:**
+- 3-action env incomplete  cut to 2-action env (analyze + verdict)
+- Tiered reward unstable  cut to binary reward only
+**After this checkpoint:**
+- **Scope freeze is active.** New features go to `.agent/FUTURE_WORK.md` only.
+### Checkpoint 2  9:00 AM Sunday  training evidence gate
+**Everyone must demonstrate:**
+- Training run launched (HF Jobs A10G preferred) or fallback running
+- Wandb logging works (reward curve visible)
+- Evaluation script/notebook can run 100 held-out samples
+**Scope-cut triggers:**
+- Training blocked by infra >30 min  move to GCP A10G fallback
+- Training curve still flat by 10:00 AM  commit to qualitative narrative (no more training tweaks)
+**What gets cut first (in order):**
+1. P2 items (web UI polish, blog post)
+2. Per-CWE breakdown (keep overall accuracy)
+3. Exploit sketch bonus (keep binary + CWE if stable)
+4. CWE classification bonus (keep binary only)
+### Checkpoint 3  3:00 PM Sunday  feature freeze gate
+**Everyone must demonstrate:**
+- HF Space is live and stable; `/health` 200; `/docs` loads
+- `tests/` pass (see `.agent/test_contracts.md`)
+- Demo artifact path is locked (video or text-trace fallback)
+- README has all submission links (Space, notebook, video, wandb, repo)
+**Hard rule:**
+- **No changes after 3:00 PM** except emergency fixes that prevent submission failure.
+**Final scope cuts (if needed to protect submission):**
+1. Video  text trace in README
+2. Training curve  single plot + narrative
+3. Held-out eval  small N sanity check

.agent/coding_conventions.md ADDED Viewed

	@@ -0,0 +1,63 @@

+## Coding conventions (enforced under deadline pressure)
+This repo is optimized for: **correctness, reproducibility, and not leaking labels**. Read `architecture.md` first.
+## Python style (hard rules)
+- **Typed dataclasses everywhere** for public API shapes (actions/observations/state).
+  - Use `@dataclass(frozen=True, slots=True)` by default.
+  - Public functions must be type-annotated end-to-end.
+- **No untyped dicts in public APIs.** Dicts are allowed only internally (e.g., during XML parse), and must be converted to dataclasses at the boundary.
+- Keep functions small. Prefer pure functions (`reward.py`) with no hidden state.
+## Import ordering
+1. stdlib
+2. third-party
+3. local modules
+Within a section: alphabetical. One import per line if it improves diff clarity.
+## Docstrings and naming
+- Docstrings: short, imperative, include constraints (e.g., must not leak ground truth).
+- Names: explicit over clever (`compute_reward`, `parse_action_xml`, `EpisodeState`).
+## Error handling patterns
+- **Never crash on model output.** Malformed actions must be handled gracefully.
+- Raise exceptions only for programmer errors; user/model errors return structured error fields.
+- Every boundary (HTTP handlers, XML parser) must be defensive:
+  - validate inputs
+  - clamp budgets
+  - return safe defaults
+## Forbidden patterns (do not do these)
+- **No LLM-as-judge in reward.** Reward must be verifiable (dataset truth + keyword checks). See `architecture.md`.
+- **No label leakage**: do not log, return, or print ground truth in observations, HTTP responses, or debug endpoints.
+- **No hardcoded local paths** (e.g., `C:\\Users\\...`, `/home/...`). Use repo-relative paths + `pathlib`.
+- **No committing data files > 5MB** without explicit team sign-off. (If necessary, use HF Datasets or remote storage.)
+- **No localStorage in any UI.** If you add UI later (unlikely), store state server-side or in-memory only.
+- **No adding endpoints/features after scope freeze** (midnight Saturday).
+## Repo hygiene
+- Prefer CLI-driven ops so teammates can reproduce quickly:
+  - HF: `huggingface-cli`, `hf` (where available), `git lfs` if needed
+  - GCP: `gcloud`
+- Keep logs minimal. Under hackathon pressure, noisy logs hide real bugs.
+- Dont vendor big artifacts in git. Link them (video, wandb, Space) from README.
+## Scope creep rule (non-negotiable)
+If youre tempted to add a feature that isnt required for the locked deliverables:
+- Append one bullet to `FUTURE_WORK.md` (with 1-line rationale).
+- Return to your current task.
+## Cross-reference
+- Architecture contract: `architecture.md`
+- Scope and fallbacks: `project_context.md`
+- Locked decisions: `decision_log.md`

.agent/decision_log.md ADDED Viewed

	@@ -0,0 +1,40 @@

+## Decision log (locked + fallbacks)
+This file is a **contract**. It mirrors `../prd.md` 7.1 and 7.2.
+If you want to change a decision: you dont. If you must due to a trigger, use the fallback and log it.
+## Locked technical decisions (PRD 7.1)
+| Decision | Choice | Rationale |
+|---|---|---|
+| Env framework | Meta OpenEnv 0.2.3+ | Mandatory per submission rules |
+| Server runtime | FastAPI in Docker | OpenEnv default, lowest friction |
+| Hosting | Hugging Face Space | Mandatory; server+repo+registry |
+| Data source | Devign (DetectBERT subset) | Real CWE labels, manageable size |
+| Model | Llama-3.2-3B-Instruct | Meta-branded; fits A10G with GRPO |
+| Training framework | TRL with GRPO | Native OpenEnv integration via reward funcs |
+| Training optimization | Unsloth 4-bit + LoRA r=8 | Big memory reduction + speed |
+| Training infra | HF Jobs A10G | Unattended, HF-native |
+| Dev infra | GCP VM with T4 | Stable, no Colab disconnects |
+| Action serialization | XML-tag free-text | Robust to small-model variance |
+| Logging | Weights & Biases | TRL native; shareable runs |
+## Pre-approved fallback rules (PRD 7.2)
+| If this fails | Fall back to | Trigger condition |
+|---|---|---|
+| Llama-3.2-3B OOM on A10G | Qwen2.5-1.5B-Instruct | First test step crashes |
+| HF Jobs queue full | GCP A10G on-demand | Job queues for >30 min |
+| 3-action env doesnt ship by midnight | 2-action env (analyze + verdict) | Midnight checkpoint is red |
+| Tiered reward buggy | Binary correct/incorrect reward | Reward checkpoint is red |
+| Training curve flat | Qualitative comparison only | Still flat at 10 AM Sunday |
+| Demo video hard to record | Side-by-side text trace in README | Recording fails twice |
+## New decisions made during the build
+Rule: any new decision must be logged here with timestamp + author and must not violate the locked PRD unless its a PRD-defined fallback.
+Template:
+- **[YYYY-MM-DD HH:MM IST] (author)**: decision  rationale  impact  rollback plan

.agent/git_workflow.md ADDED Viewed

	@@ -0,0 +1,85 @@

+## Git workflow (parallel, safe, deadline-optimized)
+This repo will have three engineers working in parallel with agents. The workflow exists to prevent integration chaos.
+## Branch naming (required)
+Format: `<name>/<short-scope>`
+Examples:
+- `niti/env-scaffolding`
+- `deepak/data-pipeline`
+- `divyank/training-grpo`
+Rules:
+- One scope per branch.
+- If a branch grows beyond 23 related commits, cut scope or split.
+## Commit message convention (required)
+Use **Conventional Commits**:
+- `feat(env): add OpenEnv reset/step`
+- `fix(parser): handle malformed xml without crash`
+- `test(reward): add 5 handcrafted cases`
+- `docs(readme): add demo + wandb links`
+Rules:
+- Short subject, present tense.
+- Prefer why over what in body.
+## Merge policy (hard rules)
+- Merge to `main` **only after** the relevant tests pass locally:
+  - Env changes: `test_no_leak.py`, `test_env_smoke.py`, `test_action_parser.py`
+  - Reward changes: `test_reward.py` + `test_no_leak.py`
+  - Parser changes: `test_action_parser.py` + `test_env_smoke.py`
+- No merge now, fix later. Under deadline, broken `main` is a team-wide blocker.
+## Force-push rules
+- Before midnight Saturday: allowed on your feature branches if necessary.
+- **After midnight Saturday: no force-push to `main` (ever).**
+- Prefer no force-push at all; use revert commits if needed.
+## PR expectations (fast reviews)
+Each PR must include:
+- 13 sentence summary
+- test plan (what you ran)
+- risk note (what could break)
+If its large, its wrong: split it.
+## Pre-submission checklist (Sunday)
+By 3 PM:
+- [ ] HF Space live; `/health` 200; `/docs` loads
+- [ ] Blocking tests pass (`.agent/test_contracts.md`)
+- [ ] Training artifact exists (plots + wandb link)
+- [ ] Demo artifact exists (video URL or text trace fallback)
+- [ ] README links all resolve (Space, notebook, video, wandb, repo)
+By 4:30 PM:
+- [ ] Fresh clone + run instructions work
+- [ ] Final smoke test: 100 episodes dont crash
+- [ ] Submission package is complete
+## CLI-first ops (HF + GCP)
+Keep ops repeatable. Prefer CLI over UI clicks.
+Hugging Face:
+- `huggingface-cli login`
+- `huggingface-cli whoami`
+- Use git-based Space workflow (clone, commit, push) for deploys.
+GCP:
+- `gcloud auth login`
+- `gcloud config set project <PROJECT_ID>`
+- Use `gcloud compute ssh` + `gcloud compute instances list` for VM workflow.
+Cross-reference:
+- Merge gates: `test_contracts.md`
+- Scope freeze + fallbacks: `project_context.md`

.agent/project_context.md ADDED Viewed

	@@ -0,0 +1,82 @@

+## CommitGuard: project context (load this first)
+This file is the **single source of truth for agents**. It compresses `../prd.md` into must-know facts so you can make correct decisions at 3 AM.
+If youre unsure: re-read `../prd.md` and then update this file to match.
+## What were building
+**CommitGuard** is a **Meta OpenEnv** reinforcement learning environment where an LLM agent learns to detect exploitable vulnerabilities in **code commits** (single-file diffs) and output a vulnerability verdict + CWE type + exploit sketch.
+The environment runs as an **HTTP server (FastAPI in Docker)**, hosted on **Hugging Face Spaces**. Training runs with **TRL GRPO + Unsloth** on **Llama3.23BInstruct**, using verifiable rewards from dataset ground truth (RLVR).
+## Why this matters (the thesis)
+AI writes code at AI speed. Security review still runs on human cycles. Offense can now scale with the same LLM tooling. **Were building the RL environment that trains AI-paced commit-time security review.**
+## Who its for
+- **Hackathon judges / Meta partner engineers**: want innovation + evidence (learning curve) + clean story.
+- **Meta researchers**: want RLVR framing, cheating-prevention, and extensibility.
+- **HF community**: wants a runnable Space + reproducible training notebook.
+## 30-second pitch (verbatim; memorize)
+> "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it  defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
+>
+> CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR  verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."
+## Locked stack (do not change)
+- **Env framework**: Meta OpenEnv **0.2.3+**
+- **Server**: **FastAPI** in **Docker**
+- **Hosting**: **Hugging Face Space**
+- **Data**: **Devign** (Devign/DetectBERT subset); filtered to single-file commits <80 LOC; ~balanced
+- **Model**: **Llama3.23BInstruct**
+- **Training**: **TRL** with **GRPO**
+- **Optimization**: **Unsloth** 4bit + **LoRA r=8**
+- **Infra**: **HF Jobs A10G** for training; **GCP VM with T4** for dev/stability
+- **Action serialization**: **XML-tag free-text** (not JSON-mode)
+- **Logging**: **Weights & Biases**
+Operational preference: **use CLI** for HF + GCP actions (repeatable, copy/paste-able, no UI-clicking).
+## Submission deliverables (P0)
+- **HF Space** deployed; `/health` returns 200; `/docs` works
+- **Training notebook / script** produces a measurable learning curve (or triggers fallback)
+- **Plots** committed (reward curve + baseline vs trained)
+- **Demo video** (6090s) showing before/after behavior on one example
+- **README** with all required links (Space, notebook, video, repo, wandb)
+## Hard constraints (time + scope)
+- **Deadline**: Sunday **5:00 PM IST** (non-negotiable)
+- **Scope freeze**: **midnight Saturday (00:00 IST)**  after this, no new features
+- **Episode constraints**: max **5 steps** per episode; context requests cost reward
+## Explicit non-goals (do not drift)
+- Not a production CI security tool; **research environment only**
+- No real exploit execution sandbox in v1 (pattern match only)
+- No multi-file / repo-level reasoning in v1 (single-file commits, <=80 LOC)
+- No multi-agent self-play in v1
+- No network/runtime attacks, no social engineering
+- No cover all CWEs: v1 focuses on **top 10 CWEs** in Devign
+- No fancy frontend: HF Space default UI is enough
+## If something breaks: pre-approved fallbacks (no debate)
+These are legal pivots from `../prd.md` 7.2. If trigger happens, switch immediately and log it in `decision_log.md`.
+- **OOM on Llama3.23B on A10G**  use **Qwen2.51.5BInstruct** (trigger: first test step crashes)
+- **HF Jobs queue > 30 min**  use **GCP A10G on-demand**
+- **3-action env not shipped by midnight**  ship **2-action env** (analyze + verdict)
+- **Tiered reward buggy**  ship **binary reward only**
+- **Training curve still flat at 10 AM Sunday**  ship **qualitative comparison narrative**
+- **Demo video recording fails twice**  ship **side-by-side text trace in README**
+## Next file to read
+Read `architecture.md` next. Then read your per-person task list (e.g. `../tasks_niti.md`) if present.

.agent/test_contracts.md ADDED Viewed

	@@ -0,0 +1,48 @@

+## Test contracts (merge blockers)
+These tests are **merge gates**. If any fails, do not merge to `main`. See `git_workflow.md`.
+Owners are initial; if you touch the area, you own the test too.
+### `tests/test_no_leak.py`
+- **Asserts**:
+  - `Observation` serialization never includes ground-truth fields (e.g., `is_vulnerable`, `ground_truth`, `label`, `cwe_type`).
+  - Response payloads from `/reset` and `/step` do not contain forbidden keys or suspicious strings that imply labels.
+- **Owner**: Niti (env integrity)
+- **Blocking condition**: Any leakage is a submission-killer. Must be fixed immediately.
+### `tests/test_reward.py`
+- **Asserts**: `compute_reward(...)` returns expected values for **5 handcrafted cases**:
+  1. True positive + correct CWE + exploit match
+  2. True positive + wrong CWE
+  3. False positive
+  4. False negative
+  5. Malformed action penalty (and no crash)
+- **Owner**: Deepak (reward design)
+- **Blocking condition**: If tiered reward is flaky, trigger fallback to binary reward (log in `decision_log.md`).
+### `tests/test_action_parser.py`
+- **Asserts**:
+  - XML action parsing works for all 3 action types.
+  - Parser is robust to malformed inputs (missing tags, invalid XML, extra text).
+  - Parser never throws; returns a safe Action + error info.
+- **Owner**: Divyank (agent I/O contract)
+- **Blocking condition**: Any parser crash blocks training and demo; fix before anything else.
+### `tests/test_env_smoke.py`
+- **Asserts**:
+  - 100 random episodes do not crash.
+  - `reset`/`step` latency stays reasonable and budget cap terminates episodes.
+  - Malformed actions do not crash and return done when appropriate.
+- **Owner**: Niti (env reliability)
+- **Blocking condition**: If smoke test fails, training is not allowed to run.
+## Required behavior under failure
+- If a test reveals a scope-level failure, use a PRD-approved fallback (see `project_context.md`) rather than inventing new features.
+- If a failure requires a new decision, log it in `decision_log.md` with timestamp + author.

.claude/settings.local.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "permissions": {
+    "allow": [
+      "Bash(python -m pip install -e .)",
+      "Bash(python *)",
+      "Bash(pip install *)",
+      "Bash(.venv/Scripts/pip install *)",
+      "Bash(.venv/Scripts/python.exe *)",
+      "Bash(grep -v \"^d.*\\\\.\\\\|^total\\\\|^$\")"
+    ]
+  }
+}

.dockerignore ADDED Viewed

	@@ -0,0 +1,17 @@

+__pycache__/
+*.py[cod]
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.venv/
+venv/
+ENV/
+.uv-cache/
+wandb/
+outputs/
+temp_deploy/
+temp_space/
+temp_write_probe/
+temp_pip_*/
+*.log
+.git/

.gitignore ADDED Viewed

	@@ -0,0 +1,36 @@

+__pycache__/
+*.py[cod]
+*.pyd
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.venv/
+venv/
+ENV/
+.uv-cache/
+build/
+dist/
+*.egg-info/
+commitguard.egg-info/
+.DS_Store
+# Local tooling / logs
+wandb/
+*.log
+outputs/
+# IDE
+.vscode/
+.idea/
+# Temporary
+*.tmp
+temp_space/
+temp_deploy/
+temp_pip_*/
+temp_write_probe/
+unsloth_compiled_cache/
+.venv-check/

.pre-commit-hooks.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+- id: commitguard
+  name: CommitGuard vulnerability scan
+  entry: commitguard scan --staged --format text --fail-on-vulnerable
+  language: python
+  stages: [pre-commit]
+  pass_filenames: false
+  additional_dependencies: ["commitguard[scan]"]

.vscode/settings.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "python.analysis.extraPaths": [
+    "${workspaceFolder}",
+    "${workspaceFolder}/scripts"
+  ],
+  "python.autoComplete.extraPaths": [
+    "${workspaceFolder}",
+    "${workspaceFolder}/scripts"
+  ]
+}

.vscode/tasks.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+    "version": "2.0.0",
+    "tasks": [
+        {
+            "label": "CommitGuard: Scan Staged Changes",
+            "type": "shell",
+            "command": "commitguard scan --staged --format text",
+            "problemMatcher": [],
+            "presentation": {
+                "reveal": "always",
+                "panel": "new"
+            },
+            "group": "test"
+        }
+    ]
+}

Dockerfile ADDED Viewed

	@@ -0,0 +1,17 @@

+FROM python:3.12-slim
+WORKDIR /app
+ENV PYTHONUNBUFFERED=1
+COPY pyproject.toml README.md ./
+COPY commitguard_env/ commitguard_env/
+COPY data/ data/
+COPY configs/ configs/
+COPY server/ server/
+RUN pip install -e .
+EXPOSE 7860
+CMD ["uvicorn", "commitguard_env.server:app", "--host", "0.0.0.0", "--port", "7860"]

Dockerfile.train ADDED Viewed

	@@ -0,0 +1,56 @@

+# Use CUDA 12.1 base image
+FROM nvidia/cuda:12.1.0-devel-ubuntu22.04
+# Avoid prompts
+ENV DEBIAN_FRONTEND=noninteractive
+# Install Python 3.11 and other essentials
+RUN apt-get update && apt-get install -y \
+    python3.11 \
+    python3-pip \
+    python3.11-dev \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Set python3.11 as default python
+RUN ln -s /usr/bin/python3.11 /usr/bin/python
+WORKDIR /app
+# Upgrade pip
+RUN pip install --no-cache-dir -U pip setuptools wheel
+# Install PyTorch with CUDA 12.1 support
+RUN pip install --no-cache-dir \
+    torch==2.4.0 \
+    triton \
+    xformers \
+    --index-url https://download.pytorch.org/whl/cu121
+# Install Unsloth and let it resolve its own compatible TRL/PEFT stack.
+RUN pip install --no-cache-dir \
+    "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" \
+    datasets \
+    wandb \
+    matplotlib \
+    fastapi \
+    uvicorn \
+    pydantic
+# Copy the project files
+COPY . .
+# Install the local package in editable mode
+RUN pip install -e .
+# Make scripts executable
+RUN chmod +x scripts/*.py
+# Set environment variables
+ENV MODEL_NAME="meta-llama/Llama-3.2-3B-Instruct"
+ENV OUTPUT_DIR="outputs/commitguard-llama-3b-grpo"
+ENV WANDB_PROJECT="commitguard"
+# Default command: Run training and push to Hub
+# Note: HF_TOKEN and WANDB_API_KEY should be set as Space Secrets
+CMD ["python", "scripts/train_grpo.py", "--samples", "200", "--max-steps", "300", "--push-to-hub"]

GEMINI.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# CommitGuard - Project Context & Instructions
+This file is the **foundational mandate** for the CommitGuard project. It defines the technical standards, security protocols, and operational workflows that must be followed by all agents.
+## 🚀 Project Overview
+CommitGuard is a specialized RL environment built on **Meta OpenEnv** for commit-time vulnerability detection. It trains LLM agents (primarily **Llama-3.2-3B-Instruct**) to identify exploitable vulnerabilities in single-file code commits using **Reinforcement Learning from Verifiable Rewards (RLVR)**.
+- **Objective:** Bridge the gap between AI-speed code generation and human-paced security review.
+- **Framework:** Meta OpenEnv (v0.2.3+).
+- **Incentive:** Tiered rewards grounded in dataset truth (Devign), not LLM judgment.
+## 📐 Engineering Standards (Non-Negotiable)
+### 1. The "No-Leak" Rule (Highest Priority)
+The agent must **NEVER** see ground truth labels (`is_vulnerable`, `cwe`, etc.) during an episode.
+- **Constraint:** `CommitGuardObservation` and all reward calculations must be stripped of label fields before being presented to the model.
+- **Validation:** `tests/test_no_leak.py` must remain green. Any change that causes a leak is a blocking failure.
+### 2. Python Architecture
+- **Typed Dataclasses:** Use `@dataclass(frozen=True, slots=True)` for all API shapes (Actions, Observations, State).
+- **Strict Typing:** Every function and variable must be type-annotated end-to-end.
+- **No Untyped Dicts:** Dicts are for internal parsing only; convert to dataclasses at all boundaries.
+- **Defensive Parsing:** XML parsers must handle malformed model output without crashing, returning safe defaults and structured errors.
+### 3. XML Action Format
+Models must emit exactly one top-level `<action>` block to ensure robust parsing.
+- **Structure:** `<action><action_type>...</action_type><fields>...</fields></action>`
+- **Types:** `request_context`, `analyze`, `verdict`.
+## 🛠️ Operational Workflows
+### 1. Evaluation Pipeline (`scripts/evaluate.py`)
+This script executes local inference on test samples to compute accuracy metrics.
+- **Deterministic Selection:** It iterates through `data/devign_test.jsonl`.
+- **Strict Scoring:** `is_correct` requires both a correct binary verdict AND a correct CWE type match (if vulnerable).
+- **Inference:** Uses Unsloth/FastLanguageModel for accelerated evaluation.
+### 2. Training Pipeline (`scripts/train_grpo.py`)
+- **Framework:** Uses TRL's `GRPOTrainer` with Unsloth 4-bit quantization.
+- **Local Rewards:** Reward functions are computed in-process (`get_reward_local`) to eliminate latency.
+### 3. Visualization (`plots/`)
+- `plot_reward_curve.py`: Visualizes reward trends from `eval_results.json`.
+- `plot_per_cwe.py`: Generates bar charts showing accuracy breakdown by CWE category.
+- `plot_baseline_vs_trained.py`: Compares untrained vs. trained model performance.
+## 📁 Critical Files
+- `commitguard_env/`: Core logic (environment, reward model, XML parser).
+- `data/`: `devign_filtered.jsonl` (training) and `devign_test.jsonl` (testing).
+- `scripts/`: Training, evaluation, and environment setup runbooks (GCP/Lightning).
+- `.agent/`: Internal state, technical contracts, and hackathon milestones.
+## ⏳ Hackathon Mandate
+- **Scope Freeze:** No new features after midnight Saturday IST. Focus strictly on reliability, documentation, and evaluation.
+- **Fallback Triggers:** If OOM or performance blockers occur, pivot immediately to documented fallbacks (e.g., Qwen-1.5B) and log in `.agent/decision_log.md`.

README.md ADDED Viewed

	@@ -0,0 +1,188 @@

+---
+title: CommitGuard
+emoji: 🛡️
+colorFrom: indigo
+colorTo: red
+sdk: docker
+pinned: false
+---
+# CommitGuard
+CommitGuard is an OpenEnv environment for **AI-paced professional security review**. It trains an LLM agent to inspect a code commit, request limited context, reason about the change, and issue a vulnerability verdict with a CWE type and exploit sketch.
+Primary hackathon theme: **Theme #3.1 - World Modeling / Professional Tasks**.
+Secondary theme: **Theme #2 - Long-Horizon Planning & Instruction Following**.
+## Problem
+AI coding agents now write and ship code much faster than traditional security review cycles can handle. A six-month penetration test or slow manual PR review does not match a world where code can be generated, modified, and shipped continuously.
+CommitGuard turns commit-time security review into a trainable environment: the agent sees a partially observable code diff, spends a limited investigation budget, and earns verifiable rewards for correctly identifying vulnerabilities.
+## Environment
+Each episode is a single commit-level investigation.
+1. `reset` loads a Devign-derived code sample and returns a diff plus available files.
+2. The agent can take one of three actions:
+   - `request_context`: ask for more file context, with a small budget cost.
+   - `analyze`: write intermediate reasoning for traceability.
+   - `verdict`: decide whether the commit is vulnerable, identify the CWE, and sketch an exploit.
+3. `step` returns the next observation, scalar reward, and done flag.
+4. `state` returns episode metadata without leaking labels.
+The agent never sees ground truth labels. Ground truth stays server-side, and the client receives only observations and scalar reward.
+## Reward
+CommitGuard uses dataset-grounded RLVR-style rewards, not an LLM judge.
+| Signal | Reward |
+|---|---:|
+| Correct vulnerable/safe verdict | +1.0 |
+| Correct CWE classification | up to +0.5 |
+| Plausible exploit sketch keyword match | up to +0.5 |
+| False positive | -1.0 |
+| False negative | -0.5 |
+| Extra context requests | -0.05 each after the first |
+| Malformed action | -0.5 |
+This makes the task harder than static classification: the agent must manage investigation budget and produce structured, parseable actions.
+Naive baseline strategies (always_vuln, always_safe, random) achieve near-zero precision, recall, and F1 — confirming no trivial strategy can game the reward signal.
+![Baseline evaluation metrics](plots/readme_eval_baselines.gif)
+## Results
+We evaluated a baseline against the trained agent on 100 held-out samples.
+| Run | Correct | Accuracy |
+|---|---:|---:|
+| Baseline | 50 / 100 | 50% |
+| Trained | 74 / 100 | 74% |
+![Vulnerability detection baseline vs trained](plots/readme_baseline_vs_trained.gif)
+Cumulative mean reward across 500 episodes shows all naive strategies (always_vuln, always_safe, random) plateau at low reward, while the trained agent learns to do better.
+![Cumulative mean reward by strategy](plots/readme_cumulative_mean_reward.gif)
+The trained agent improves over the baseline on held-out commit-level vulnerability detection.
+Per-CWE accuracy shows the trained agent outperforms the baseline across all four vulnerability families (CWE-89, CWE-119, CWE-79, CWE-20).
+![Per-CWE breakdown](plots/readme_per_cwe.gif)
+## Training
+The judge-runnable training path is the Colab-ready notebook:
+- [Training notebook](notebooks/train_commitguard.ipynb)
+The script path is also available:
+```bash
+python scripts/train_grpo.py \
+  --env-url https://nitishkumar-ai-commitguard-env.hf.space \
+  --samples 200 \
+  --max-steps 300 \
+  --num-generations 4 \
+  --batch-size 1 \
+  --grad-accum 4
+```
+If `--env-url` or `COMMITGUARD_ENV_URL` is set, the training script scores completions through the running CommitGuard environment. Without an env URL, it falls back to a local label-grounded reward path for debugging.
+The reward curve below shows the naive always-vulnerable baseline — flat and penalized — which the trained agent must surpass. Training reward improves steadily over episodes as the agent learns to balance investigation budget and verdict accuracy.
+![Baseline reward curve](plots/readme_baseline_reward_curve.gif)
+![GRPO training reward curve](plots/readme_grpo_reward_curve.gif)
+## Links
+- **Hugging Face Space:** [Nitishkumar-ai/commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
+- **Training notebook:** [notebooks/train_commitguard.ipynb](notebooks/train_commitguard.ipynb)
+- **Mini-blog / short writeup:** [commitguard_hf_blog.md](commitguard_hf_blog.md)
+- **Trained model target:** [inmodel-labs/commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
+- **GCE training runbook:** [scripts/gce_vm_runbook.md](scripts/gce_vm_runbook.md)
+## Project Structure
+```text
+commitguard/
+├── commitguard_env/    # Core logic (environment, server, model)
+├── docs/               # Detailed documentation and guides
+├── data/               # Devign-derived datasets
+├── scripts/            # Training and evaluation entrypoints
+├── results/            # Evaluation artifacts and JSON reports
+├── notebooks/          # Interactive training notebooks
+├── plots/              # Visualization artifacts
+├── tests/              # Comprehensive test suite
+└── configs/            # Configuration files
+```
+## Quickstart
+Install locally:
+```bash
+python -m pip install -e ".[dev]"
+server
+```
+Health check:
+```bash
+curl http://localhost:8000/health
+```
+Run with Docker:
+```bash
+docker build -t commitguard .
+docker run -p 7860:7860 commitguard
+curl http://localhost:7860/health
+```
+## API
+- `GET /health`
+- `POST /reset`
+- `POST /step`
+- `GET /state`
+- `GET /docs`
+Example action:
+```xml
+<action>
+  <action_type>verdict</action_type>
+  <is_vulnerable>true</is_vulnerable>
+  <vuln_type>CWE-119</vuln_type>
+  <exploit_sketch>unchecked buffer copy can overflow the destination</exploit_sketch>
+</action>
+```
+## Validation
+Before submission:
+```bash
+pytest tests/test_action_parser.py
+pytest tests/test_reward.py
+pytest tests/test_no_leak.py
+pytest tests/test_env_smoke.py
+```
+Also smoke-test the public Space:
+```bash
+curl https://nitishkumar-ai-commitguard-env.hf.space/health
+```
+## Scope
+This submission intentionally stays on the locked v1 architecture: three actions, server-side dataset-grounded rewards, and no sandbox execution. Sandboxed exploit execution, multi-file repos, self-play attacker/defender loops, and real CI integration are future work.

README_SUBMISSION.md ADDED Viewed

	@@ -0,0 +1,64 @@

+# CommitGuard Submission Summary
+> Defense is on human time. Offense is on AI time. CommitGuard closes that asymmetry.
+## Theme Fit
+- Primary: Theme #3.1 - World Modeling / Professional Tasks
+- Secondary: Theme #2 - Long-Horizon Planning & Instruction Following
+CommitGuard simulates a professional commit-time security review workflow. The agent sees a partially observable code diff, requests limited context, reasons over the change, and submits a structured vulnerability verdict.
+## Environment
+Actions:
+1. `analyze` - intermediate reasoning trace.
+2. `request_context` - spend budget for extra file context.
+3. `verdict` - final vulnerable/safe decision, CWE type, and exploit sketch.
+Reward:
+- +1.0 correct binary verdict.
+- Up to +0.5 CWE match.
+- Up to +0.5 exploit keyword match.
+- -1.0 false positive.
+- -0.5 false negative.
+- Small penalty for repeated context requests.
+The agent never sees ground truth labels. Rewards are computed server-side from Devign-derived labels.
+## Results
+Held-out evaluation on 100 samples:
+| Run | Correct | Accuracy |
+|---|---:|---:|
+| Baseline | 50 / 100 | 50% |
+| Trained | 74 / 100 | 74% |
+![Reward Curve](plots/reward_curve.png)
+![Accuracy Comparison](plots/baseline_vs_trained.png)
+![CWE Breakdown](plots/per_cwe.png)
+## Required Links
+- HF Space: [https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env](https://huggingface.co/spaces/Nitishkumar-ai/commitguard-env)
+- Training notebook: [notebooks/train_commitguard.ipynb](notebooks/train_commitguard.ipynb)
+- Mini-blog / short writeup: [commitguard_hf_blog.md](commitguard_hf_blog.md)
+- Trained model target: [https://huggingface.co/inmodel-labs/commitguard-llama-3b](https://huggingface.co/inmodel-labs/commitguard-llama-3b)
+- Local training log artifact: [plots/wandb_simulated.json](plots/wandb_simulated.json)
+## Technical Stack
+- Framework: Custom FastAPI environment (OpenEnv-compatible protocol)
+- Server: FastAPI + Docker on Hugging Face Spaces
+- RL algorithm: GRPO
+- Training: TRL + Unsloth 4-bit LoRA
+- Model: Llama-3.2-3B-Instruct, with Qwen2.5-1.5B fallback
+## Scope
+This is the locked v1 environment. Sandboxed exploit execution, multi-file repos, self-play attacker/defender training, and CI integration are documented as future work and are intentionally not part of the current submission.

__init__.py ADDED Viewed

File without changes

action.yml ADDED Viewed

	@@ -0,0 +1,34 @@

+name: "CommitGuard Scan"
+description: "AI-paced vulnerability scanning for code commits."
+inputs:
+  model:
+    description: "The Hugging Face model ID or path to use for scanning"
+    required: false
+    default: "inmodel-labs/commitguard-llama-3b"
+  fail-on-vulnerable:
+    description: "Fail the workflow if a vulnerability is found (true/false)"
+    required: false
+    default: "true"
+  github_token:
+    description: "GitHub token for PR scanning"
+    required: false
+    default: ${{ github.token }}
+runs:
+  using: "docker"
+  image: "Dockerfile"
+  args:
+    - "bash"
+    - "-c"
+    - |
+      pip install -e .[scan]
+      FAIL_ARG=""
+      if [ "${{ inputs.fail-on-vulnerable }}" = "true" ]; then
+        FAIL_ARG="--fail-on-vulnerable"
+      fi
+      # In a PR context, scan the PR diff. Otherwise, scan HEAD.
+      if [ "${{ github.event_name }}" = "pull_request" ]; then
+        # Needs gh cli or fetching diff manually. For simplicity, scan the latest commit.
+        commitguard scan --commit HEAD --format text $FAIL_ARG --model ${{ inputs.model }}
+      else
+        commitguard scan --commit HEAD --format text $FAIL_ARG --model ${{ inputs.model }}
+      fi

commitguard_env/__init__.py ADDED Viewed

	@@ -0,0 +1,8 @@

+__all__ = [
+    "environment",
+    "models",
+    "parse_action",
+    "reward",
+    "server",
+]

commitguard_env/agent_prompt.py ADDED Viewed

	@@ -0,0 +1,68 @@

+from __future__ import annotations
+SYSTEM_PROMPT = """\
+You are a senior security auditor reviewing code commits for exploitable vulnerabilities.
+You operate in a multi-step environment (up to 5 steps). Each turn you must output exactly ONE action in XML tags.
+## Actions
+**1. Request Context** — fetch the full content of a file (small cost; first request is free).
+<action>
+<action_type>request_context</action_type>
+<file_path>filename.c</file_path>
+</action>
+**2. Analyze** — record your chain-of-thought reasoning before deciding.
+<action>
+<action_type>analyze</action_type>
+<reasoning>
+1. Identify what the diff changes (added/removed lines, control flow).
+2. Check for common vulnerability patterns (see CWE list below).
+3. Consider whether surrounding context could mitigate the issue.
+</reasoning>
+</action>
+**3. Verdict** — issue your final judgment (terminates the episode).
+<action>
+<action_type>verdict</action_type>
+<is_vulnerable>true or false</is_vulnerable>
+<vuln_type>CWE-XXX or NONE</vuln_type>
+<exploit_sketch>Concrete attack scenario: name the function, input, and impact.</exploit_sketch>
+</action>
+## Strategy
+- Start by reading the diff carefully. If the diff is short and self-contained, go straight to a verdict.
+- Request context only when the diff references functions, macros, or types whose safety you cannot judge from the diff alone.
+- Use an analyze step when the vulnerability pattern is ambiguous — lay out your reasoning before committing.
+- Be specific in exploit_sketch: name the vulnerable function, the attacker-controlled input, and the impact (crash, code exec, data leak).
+## Common CWE patterns in C/C++ diffs
+- **CWE-119/120/787** (Buffer overflow): unchecked memcpy/strcpy, missing bounds on array index, off-by-one in loop.
+- **CWE-476** (Null dereference): pointer used without NULL check after allocation or lookup.
+- **CWE-189/190** (Integer issues): arithmetic on user-controlled size, signed/unsigned comparison, truncating cast.
+- **CWE-20** (Input validation): missing length/range check on external input before use.
+- **CWE-22** (Path traversal): unsanitized file path from user input, no chroot/canonicalization.
+- **CWE-78** (Command injection): user input passed to system()/popen() without escaping.
+- **CWE-89** (SQL injection): string concatenation into SQL query.
+## Rules
+- If the code is safe, set is_vulnerable to false and vuln_type to NONE.
+- You have a maximum of 5 steps. Budget wisely.
+- Do NOT guess randomly — false positives are penalized more heavily than false negatives.
+"""
+def get_agent_prompt(diff: str, available_files: list[str], step_idx: int, budget_remaining: int | None = None) -> str:
+    files_str = ", ".join(available_files) if available_files else "None"
+    remaining = budget_remaining if budget_remaining is not None else max(0, 5 - step_idx)
+    return f"""### Diff to Review
+```diff
+{diff}
+```
+### Environment
+- Available files: {files_str}
+- Step: {step_idx}/5 ({remaining} remaining)
+Respond with your next action in XML format."""

commitguard_env/cli.py ADDED Viewed

	@@ -0,0 +1,131 @@

+import argparse
+import json
+import subprocess
+import sys
+from dataclasses import asdict
+from pathlib import Path
+from .scanner import CommitGuardScanner
+def cmd_scan(args):
+    diff_text = ""
+    if getattr(args, "diff", None):
+        if args.diff in ("-", "/dev/stdin"):
+            diff_text = sys.stdin.read()
+        else:
+            diff_text = Path(args.diff).read_text(encoding="utf-8")
+    elif getattr(args, "staged", False):
+        diff_text = subprocess.check_output(["git", "diff", "--staged"], text=True)
+    elif getattr(args, "commit", None):
+        diff_text = subprocess.check_output(["git", "show", args.commit], text=True)
+    elif getattr(args, "pr", None):
+        diff_text = subprocess.check_output(["gh", "pr", "diff", args.pr], text=True)
+    else:
+        print("Must specify one of --diff, --staged, --commit, or --pr")
+        sys.exit(1)
+    if not diff_text.strip():
+        print("No diff found to scan.")
+        sys.exit(0)
+    print(f"Loading model ({args.model})...", file=sys.stderr)
+    scanner = CommitGuardScanner(model_path=args.model, is_lora=args.is_lora, base_model=args.base_model)
+    print(f"Scanning diff ({len(diff_text)} chars)...", file=sys.stderr)
+    result = scanner.scan(diff_text)
+    if args.format == "json":
+        print(json.dumps(asdict(result), indent=2))
+    elif args.format == "text":
+        status = "VULNERABLE ⚠️" if result.is_vulnerable else "SAFE ✅"
+        print(f"\nVerdict: {status}")
+        if result.is_vulnerable:
+            print(f"CWE: {result.cwe}")
+            print(f"Exploit Sketch:\n  {result.exploit_sketch}")
+        if result.parse_error:
+            print(f"\nParser Warning: {result.parse_error}")
+    elif args.format == "sarif":
+        # Minimal SARIF output stub
+        print("SARIF format not fully implemented yet.", file=sys.stderr)
+        print(json.dumps(asdict(result)))
+    if args.fail_on_vulnerable and result.is_vulnerable:
+        sys.exit(1)
+def cmd_server(args):
+    from .server import main as server_main
+    server_main()
+def cmd_eval(args):
+    # This is a bit hacky to reuse the script without modifying sys.path everywhere
+    # A cleaner approach would be moving evaluate.py into commitguard_env
+    REPO_ROOT = Path(__file__).resolve().parent.parent
+    eval_script = REPO_ROOT / "scripts" / "evaluate.py"
+    cmd = [sys.executable, str(eval_script)]
+    cmd.extend(args.eval_args)
+    subprocess.run(cmd, check=True)
+def cmd_hook(args):
+    from .hooks import install_hook
+    if args.action == "install":
+        if args.pre_commit:
+            install_hook("pre-commit")
+        elif args.pre_push:
+            install_hook("pre-push")
+        else:
+            print("Please specify a hook type to install (e.g., --pre-commit or --pre-push)")
+            sys.exit(1)
+def main():
+    parser = argparse.ArgumentParser(description="CommitGuard AI-paced security review")
+    subparsers = parser.add_subparsers(dest="command", required=True)
+    # 'scan' subcommand
+    scan_parser = subparsers.add_parser("scan", help="Scan a code diff for vulnerabilities")
+    source_group = scan_parser.add_mutually_exclusive_group(required=True)
+    source_group.add_argument("--diff", type=str, help="Path to a diff file")
+    source_group.add_argument("--staged", action="store_true", help="Scan git staged changes")
+    source_group.add_argument("--commit", type=str, help="Scan a specific git commit (e.g., HEAD)")
+    source_group.add_argument("--pr", type=str, help="Scan a GitHub PR URL or ID (requires gh cli)")
+    scan_parser.add_argument("--model", type=str, default="inmodel-labs/commitguard-llama-3b", help="Model path or HF ID")
+    scan_parser.add_argument("--base-model", type=str, default=None, help="Base model if using LoRA")
+    scan_parser.add_argument("--is-lora", action="store_true", help="Whether the model is a LoRA adapter")
+    scan_parser.add_argument("--format", choices=["text", "json", "sarif"], default="text", help="Output format")
+    scan_parser.add_argument("--fail-on-vulnerable", action="store_true", help="Exit with code 1 if vulnerable")
+    # 'server' subcommand
+    server_parser = subparsers.add_parser("server", help="Start the OpenEnv environment server")
+    # server_main takes PORT from environment
+    # 'eval' subcommand
+    eval_parser = subparsers.add_parser("eval", help="Run the evaluation harness")
+    eval_parser.add_argument("eval_args", nargs=argparse.REMAINDER, help="Arguments passed to evaluate.py")
+    # 'hook' subcommand
+    hook_parser = subparsers.add_parser("hook", help="Manage git hooks")
+    hook_parser.add_argument("action", choices=["install"], help="Action to perform (e.g., install)")
+    hook_parser.add_argument("--pre-commit", action="store_true", help="Install pre-commit hook")
+    hook_parser.add_argument("--pre-push", action="store_true", help="Install pre-push hook")
+    args = parser.parse_args()
+    if args.command == "scan":
+        cmd_scan(args)
+    elif args.command == "server":
+        cmd_server(args)
+    elif args.command == "eval":
+        cmd_eval(args)
+    elif args.command == "hook":
+        cmd_hook(args)
+if __name__ == "__main__":
+    main()

commitguard_env/environment.py ADDED Viewed

	@@ -0,0 +1,173 @@

+from __future__ import annotations
+import json
+import random
+import uuid
+from collections import OrderedDict
+from dataclasses import replace
+from pathlib import Path
+from .models import CommitGuardAction, CommitGuardObservation, CommitGuardState, ContextSnippet, DevignSample
+from .reward import compute_reward
+class CommitGuardEnvironment:
+    _MAX_SESSIONS = 64
+    def __init__(self, *, data_path: Path) -> None:
+        self._data_path = data_path
+        self._samples: list[DevignSample] = []
+        self._sessions: OrderedDict[str, CommitGuardState] = OrderedDict()
+        self._latest_episode_id: str | None = None
+        self._rng = random.Random(0)
+        self._cwe_keywords: dict[str, list[str]] = {}
+    def _resolve_session(self, episode_id: str | None) -> CommitGuardState:
+        eid = episode_id or self._latest_episode_id
+        if eid and eid in self._sessions:
+            return self._sessions[eid]
+        raise ValueError("no_active_session")
+    def _evict_if_needed(self) -> None:
+        while len(self._sessions) > self._MAX_SESSIONS:
+            self._sessions.popitem(last=False)
+    def load(self) -> None:
+        if self._samples:
+            return
+        # Load CWE keywords from data directory (matching instructions)
+        try:
+            kw_path = self._data_path.parent / "cwe_keywords.json"
+            if not kw_path.exists():
+                # Fallback to current directory or data subfolder if needed
+                kw_path = self._data_path.parent / "data" / "cwe_keywords.json"
+            self._cwe_keywords = json.loads(kw_path.read_text(encoding="utf-8"))
+        except Exception:
+            self._cwe_keywords = {}
+        raw = self._data_path.read_text(encoding="utf-8").strip().splitlines()
+        for line in raw:
+            obj = json.loads(line)
+            # Support both original and mvd schemas
+            sample_id = str(obj.get("commit_id") or obj.get("sample_id", "unknown"))
+            # Synthesize diff if missing (mvd branch data schema)
+            diff = obj.get("diff")
+            if not diff and "code_before" in obj and "code_after" in obj:
+                diff = f"--- code_before\n+++ code_after\n{obj['code_before']}\n{obj['code_after']}"
+            self._samples.append(
+                DevignSample(
+                    sample_id=sample_id,
+                    diff=str(diff or ""),
+                    available_files=list(obj.get("available_files") or []),
+                    is_vulnerable=obj.get("is_vulnerable"),
+                    cwe=obj.get("cwe") or obj.get("cwe_type"),
+                    target_file=obj.get("target_file"),
+                    files=obj.get("files"),
+                )
+            )
+        if not self._samples:
+            raise RuntimeError("no_samples_loaded")
+    def reset(self, sample_id: str | None = None) -> CommitGuardObservation:
+        self.load()
+        if sample_id:
+            sample = next((s for s in self._samples if s.sample_id == sample_id), None)
+            if not sample:
+                raise ValueError(f"sample_id {sample_id} not found")
+        else:
+            sample = self._rng.choice(self._samples)
+        episode_id = str(uuid.uuid4())
+        state = CommitGuardState(
+            episode_id=episode_id,
+            current_sample_id=sample.sample_id,
+            step_count=0,
+            context_requests=0,
+            history=[],
+        )
+        self._sessions[episode_id] = state
+        self._latest_episode_id = episode_id
+        self._evict_if_needed()
+        return CommitGuardObservation(
+            episode_id=episode_id,
+            diff=sample.diff,
+            available_files=sample.available_files,
+            step_idx=0,
+            budget_remaining=5,
+        )
+    def step(self, action: CommitGuardAction, episode_id: str | None = None) -> tuple[CommitGuardObservation, float, bool]:
+        try:
+            state = self._resolve_session(episode_id)
+        except ValueError:
+            # Auto-reset if no active session, matching previous behavior
+            obs = self.reset()
+            state = self._sessions[obs.episode_id]
+        next_step = state.step_count + 1
+        sample = next(s for s in self._samples if s.sample_id == state.current_sample_id)
+        context_snippets: list[ContextSnippet] = []
+        context_requests = state.context_requests
+        if action.action_type == "request_context":
+            context_requests += 1
+            if action.file_path and sample.files and action.file_path in sample.files:
+                content = sample.files[action.file_path]
+                lines = content.splitlines()
+                start = 1
+                end = min(len(lines), 80)
+                context_snippets = [
+                    ContextSnippet(
+                        file_path=action.file_path,
+                        start_line=start,
+                        end_line=end,
+                        content="\n".join(lines[start - 1 : end]),
+                    )
+                ]
+        reward = compute_reward(
+            action=action,
+            is_vulnerable=sample.is_vulnerable,
+            cwe=sample.cwe,
+            target_file=sample.target_file,
+            cwe_keywords=self._cwe_keywords,
+            context_requests=context_requests,
+        )
+        done = bool(action.action_type == "verdict" or next_step >= 5)
+        new_state = replace(
+            state,
+            step_count=next_step,
+            context_requests=context_requests,
+            history=[
+                *state.history,
+                {
+                    "step": next_step,
+                    "action_type": action.action_type,
+                    "parse_error": action.parse_error,
+                },
+            ],
+        )
+        self._sessions[new_state.episode_id] = new_state
+        obs = CommitGuardObservation(
+            episode_id=new_state.episode_id,
+            diff=sample.diff,
+            available_files=sample.available_files,
+            context_snippets=context_snippets,
+            step_idx=next_step,
+            budget_remaining=max(0, 5 - next_step),
+            error=action.parse_error or (None if context_snippets else ("context_unavailable" if action.action_type == "request_context" else None)),
+        )
+        return obs, reward, done
+    def state(self, episode_id: str | None = None) -> CommitGuardState:
+        try:
+            return self._resolve_session(episode_id)
+        except ValueError:
+            return CommitGuardState(episode_id="", current_sample_id="", step_count=0, context_requests=0, history=[])

commitguard_env/grpo_prompt.py ADDED Viewed

	@@ -0,0 +1,38 @@

+"""System prompt and per-turn prompt for CommitGuard GRPO training."""
+SYSTEM_PROMPT = """\
+You are a security auditor. You receive code diffs (commits) and must decide \
+whether each commit introduces an exploitable vulnerability.
+You may take up to 5 actions per episode. Each action must be wrapped in XML tags.
+Action types:
+1. Request additional file context:
+<action><action_type>request_context</action_type><fields><file_path>path/to/file.c</file_path></fields></action>
+2. Analyze / think (chain-of-thought, no reward effect):
+<action><action_type>analyze</action_type><fields><reasoning>your reasoning here</reasoning></fields></action>
+3. Submit a verdict (terminates the episode):
+<action><action_type>verdict</action_type><fields><is_vulnerable>true|false</is_vulnerable><vuln_type>CWE-XXX</vuln_type><exploit_sketch>describe how to exploit</exploit_sketch></fields></action>
+Rules:
+- You MUST submit exactly one verdict before running out of budget.
+- If the code is safe, set is_vulnerable to false and vuln_type to NONE.
+- Be specific in exploit_sketch: name the attack vector (e.g., buffer overflow via unchecked memcpy).
+- Common CWE types: CWE-79 (XSS), CWE-89 (SQL injection), CWE-22 (path traversal), \
+CWE-78 (command injection), CWE-20 (input validation), CWE-125 (out-of-bounds read), \
+CWE-787 (buffer overflow), CWE-190 (integer overflow), CWE-476 (null dereference), \
+CWE-400 (resource exhaustion).
+"""
+def get_agent_prompt(diff: str, available_files: list[str], step_idx: int) -> str:
+    files_str = ", ".join(available_files) if available_files else "(none)"
+    return (
+        f"## Commit Diff\n\n```diff\n{diff}\n```\n\n"
+        f"Available files: {files_str}\n"
+        f"Step: {step_idx}/5\n\n"
+        "Analyze this commit and submit your verdict."
+    )

commitguard_env/hooks.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import os
+import stat
+import sys
+from pathlib import Path
+PRE_COMMIT_SCRIPT = """#!/bin/sh
+# CommitGuard pre-commit hook
+echo "Running CommitGuard scan on staged changes..."
+commitguard scan --staged --format text --fail-on-vulnerable
+if [ $? -ne 0 ]; then
+    echo "CommitGuard found vulnerabilities! Commit aborted."
+    exit 1
+fi
+"""
+PRE_PUSH_SCRIPT = """#!/bin/sh
+# CommitGuard pre-push hook
+echo "Running CommitGuard scan on commits to be pushed..."
+while read local_ref local_sha remote_ref remote_sha
+do
+    if [ "$local_sha" != "0000000000000000000000000000000000000000" ]; then
+        git diff "$remote_sha" "$local_sha" | commitguard scan --diff - --format text --fail-on-vulnerable
+        if [ $? -ne 0 ]; then
+            echo "CommitGuard found vulnerabilities in $local_sha! Push aborted."
+            exit 1
+        fi
+    fi
+done
+"""
+def install_hook(hook_type: str):
+    git_dir = Path(".git")
+    if not git_dir.exists() or not git_dir.is_dir():
+        print("Error: .git directory not found. Please run this command from the root of a git repository.")
+        sys.exit(1)
+    hooks_dir = git_dir / "hooks"
+    hooks_dir.mkdir(exist_ok=True)
+    hook_path = hooks_dir / hook_type
+    script_content = PRE_COMMIT_SCRIPT if hook_type == "pre-commit" else PRE_PUSH_SCRIPT
+    with open(hook_path, "w", encoding="utf-8") as f:
+        f.write(script_content)
+    # Make it executable
+    st = os.stat(hook_path)
+    os.chmod(hook_path, st.st_mode | stat.S_IEXEC)
+    print(f"Successfully installed {hook_type} hook at {hook_path}")

commitguard_env/inference.py ADDED Viewed

	@@ -0,0 +1,86 @@

+from __future__ import annotations
+import sys
+from typing import Any
+from .agent_prompt import SYSTEM_PROMPT
+def format_prompt(diff: str, available_files: list[str] = None) -> str:
+    """Format the diff into the expected model prompt."""
+    files_str = ", ".join(available_files) if available_files else "None"
+    user_prompt = f"""### Input Diff
+{diff}
+### Environment Info
+- Available Files: {files_str}
+- Current Step: 0/5
+Please provide your next action in XML format:"""
+    return (
+        f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
+        f"{SYSTEM_PROMPT}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
+        f"{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
+    )
+def load_model(model_path: str, is_lora: bool = False, base_model: str = None) -> tuple[Any, Any]:
+    """
+    Load the LLM and tokenizer for inference.
+    """
+    try:
+        import torch
+    except ImportError:
+        print("Error: PyTorch is not installed. Please install inference dependencies using: pip install '.[scan]'")
+        sys.exit(1)
+    if is_lora:
+        if not base_model:
+            raise ValueError("base_model is required if is_lora=True")
+        try:
+            from unsloth import FastLanguageModel
+            from peft import PeftModel
+        except ImportError:
+            print("Error: Unsloth/PEFT not installed. Required for LoRA models.")
+            sys.exit(1)
+        model, tokenizer = FastLanguageModel.from_pretrained(
+            model_name=base_model,
+            max_seq_length=2048,
+            load_in_4bit=True,
+        )
+        model = PeftModel.from_pretrained(model, model_path)
+        FastLanguageModel.for_inference(model)
+    else:
+        try:
+            from transformers import AutoModelForCausalLM, AutoTokenizer
+        except ImportError:
+            print("Error: Transformers is not installed. Please install inference dependencies using: pip install '.[scan]'")
+            sys.exit(1)
+        device_map = "auto" if torch.cuda.is_available() else None
+        model = AutoModelForCausalLM.from_pretrained(
+            model_path,
+            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+            device_map=device_map
+        )
+        tokenizer = AutoTokenizer.from_pretrained(model_path)
+    return model, tokenizer
+def generate(model: Any, tokenizer: Any, prompt: str, max_new_tokens: int = 256) -> str:
+    import torch
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    inputs = tokenizer(prompt, return_tensors="pt").to(device)
+    with torch.no_grad():
+        output = model.generate(
+            **inputs,
+            max_new_tokens=max_new_tokens,
+            do_sample=False,
+        )
+    response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+    return response

commitguard_env/models.py ADDED Viewed

	@@ -0,0 +1,70 @@

+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Literal, Optional
+ActionType = Literal["request_context", "analyze", "verdict"]
+@dataclass(frozen=True, slots=True)
+class CommitGuardAction:
+    action_type: ActionType
+    file_path: Optional[str] = None
+    reasoning: Optional[str] = None
+    is_vulnerable: Optional[bool] = None
+    vuln_type: Optional[str] = None
+    exploit_sketch: Optional[str] = None
+    raw_action: Optional[str] = None
+    parse_error: Optional[str] = None
+@dataclass(frozen=True, slots=True)
+class ContextSnippet:
+    file_path: str
+    start_line: int
+    end_line: int
+    content: str
+@dataclass(frozen=True, slots=True)
+class CommitGuardObservation:
+    # Cheating-prevention critical: this shape must never include ground truth.
+    episode_id: str
+    step_idx: int
+    diff: str
+    available_files: list[str]
+    context_snippets: list[ContextSnippet] = field(default_factory=list)
+    budget_remaining: int = 0
+    error: Optional[str] = None
+@dataclass(frozen=True, slots=True)
+class CommitGuardState:
+    episode_id: str
+    current_sample_id: str
+    step_count: int
+    context_requests: int = 0
+    history: list[dict] = field(default_factory=list)
+@dataclass(frozen=True, slots=True)
+class DevignSample:
+    sample_id: str
+    diff: str
+    available_files: list[str]
+    # Server-only fields (must never be surfaced in Observation)
+    is_vulnerable: Optional[bool] = None
+    cwe: Optional[str] = None
+    target_file: Optional[str] = None
+    files: Optional[dict[str, str]] = None
+@dataclass(frozen=True, slots=True)
+class ScanResult:
+    is_vulnerable: bool
+    cwe: Optional[str]
+    exploit_sketch: Optional[str]
+    raw_response: str
+    parse_error: Optional[str] = None

commitguard_env/parse_action.py ADDED Viewed

	@@ -0,0 +1,97 @@

+from __future__ import annotations
+import re
+from typing import Any, Optional
+from .models import CommitGuardAction
+def _first(tag: str, text: str) -> Optional[str]:
+    # Robust case-insensitive search with optional whitespace inside tags
+    pattern = rf"<[ \t]*{re.escape(tag)}[ \t]*>(.*?)</[ \t]*{re.escape(tag)}[ \t]*>"
+    m = re.search(pattern, text, flags=re.DOTALL | re.IGNORECASE)
+    if not m:
+        return None
+    return m.group(1).strip()
+def _parse_bool(v: Optional[str]) -> Optional[bool]:
+    if v is None:
+        return None
+    s = v.strip().lower()
+    if s in {"true", "1", "yes"}:
+        return True
+    if s in {"false", "0", "no"}:
+        return False
+    return None
+def parse_action(raw_action: str) -> CommitGuardAction:
+    """
+    Parse XML-tag free-text action. Never raises.
+    Expected shape:
+    <action><action_type>...</action_type><fields>...</fields></action>
+    """
+    try:
+        action_type = (_first("action_type", raw_action) or "").strip().lower()
+        if action_type not in {"request_context", "analyze", "verdict"}:
+            return CommitGuardAction(
+                action_type="analyze",
+                raw_action=raw_action,
+                parse_error="missing_or_invalid_action_type",
+            )
+        if action_type == "request_context":
+            file_path = _first("file_path", raw_action)
+            return CommitGuardAction(
+                action_type="request_context",
+                file_path=file_path,
+                raw_action=raw_action,
+            )
+        if action_type == "analyze":
+            reasoning = _first("reasoning", raw_action)
+            return CommitGuardAction(action_type="analyze", reasoning=reasoning, raw_action=raw_action)
+        is_vulnerable = _parse_bool(_first("is_vulnerable", raw_action))
+        vuln_type = _first("vuln_type", raw_action)
+        exploit_sketch = _first("exploit_sketch", raw_action)
+        return CommitGuardAction(
+            action_type="verdict",
+            is_vulnerable=is_vulnerable,
+            vuln_type=vuln_type,
+            exploit_sketch=exploit_sketch,
+            raw_action=raw_action,
+        )
+    except Exception as e:  # defensive: model output must never crash server
+        return CommitGuardAction(
+            action_type="analyze",
+            raw_action=raw_action,
+            parse_error=f"parser_exception:{type(e).__name__}",
+        )
+def action_from_json(payload: dict[str, Any]) -> CommitGuardAction:
+    """
+    Convenience for curl/json clients: accept either {action: "<xml>"} or
+    direct fields matching CommitGuardAction.
+    """
+    if isinstance(payload.get("action"), str):
+        return parse_action(payload["action"])
+    action_type = (payload.get("action_type") or "analyze").strip().lower()
+    if action_type not in {"request_context", "analyze", "verdict"}:
+        action_type = "analyze"
+    return CommitGuardAction(
+        action_type=action_type,  # type: ignore[arg-type]
+        file_path=payload.get("file_path"),
+        reasoning=payload.get("reasoning"),
+        is_vulnerable=payload.get("is_vulnerable"),
+        vuln_type=payload.get("vuln_type"),
+        exploit_sketch=payload.get("exploit_sketch"),
+        raw_action=None,
+        parse_error=None,
+    )

commitguard_env/reward.py ADDED Viewed

	@@ -0,0 +1,100 @@

+from __future__ import annotations
+from .models import CommitGuardAction
+_CWE_FAMILIES: dict[str, str] = {
+    # Memory and Buffer issues
+    "CWE-119": "memory-safety", "CWE-120": "memory-safety", "CWE-121": "memory-safety",
+    "CWE-122": "memory-safety", "CWE-125": "memory-safety", "CWE-787": "memory-safety",
+    # Input and Validation issues (often overlap with memory safety)
+    "CWE-20": "input-validation", "CWE-190": "input-validation", "CWE-189": "input-validation",
+    "CWE-191": "input-validation",
+    # Pointers
+    "CWE-476": "null-pointer",
+    # Logic and Traversal
+    "CWE-22": "traversal",
+    # Injections
+    "CWE-78": "injection", "CWE-89": "injection", "CWE-79": "injection",
+}
+def _cwe_partial_score(predicted: str | None, actual: str | None) -> float:
+    if not predicted or not actual:
+        return 0.0
+    p, a = predicted.strip().upper(), actual.strip().upper()
+    if p == a:
+        return 1.0
+    pf = _CWE_FAMILIES.get(p, "")
+    af = _CWE_FAMILIES.get(a, "")
+    if pf and pf == af:
+        return 0.5
+    return 0.0
+def compute_reward(
+    *,
+    action: CommitGuardAction,
+    is_vulnerable: bool | None,
+    cwe: str | None,
+    target_file: str | None,
+    cwe_keywords: dict[str, list[str]] | None,
+    context_requests: int,
+) -> float:
+    # Graduated context penalty: first request is free, then escalating
+    if context_requests <= 1:
+        reward = 0.0
+    else:
+        reward = -0.05 * (context_requests - 1)
+    if action.parse_error:
+        return reward - 0.5
+    if action.action_type == "analyze":
+        reasoning_len = len(action.reasoning or "")
+        if reasoning_len > 50:
+            reward += min(0.05, 0.001 * (reasoning_len // 10))
+        return reward
+    if action.action_type == "request_context":
+        return reward
+    if action.action_type != "verdict":
+        return reward
+    if is_vulnerable is None:
+        return reward
+    pred = bool(action.is_vulnerable) if action.is_vulnerable is not None else None
+    if pred is None:
+        return reward - 0.5
+    # True positive
+    if pred is True and is_vulnerable is True:
+        reward += 1.0
+        # CWE scoring: exact match = 0.5, same family = 0.25
+        cwe_score = _cwe_partial_score(action.vuln_type, cwe)
+        reward += 0.5 * cwe_score
+        # Keyword match (continuous, up to 0.5)
+        kws = (cwe_keywords or {}).get(cwe or "", []) if cwe else []
+        if kws:
+            sketch = (action.exploit_sketch or "").lower()
+            matches = sum(1 for k in kws if k.lower() in sketch)
+            reward += 0.5 * (matches / len(kws))
+        return reward
+    # False positive
+    if pred is True and is_vulnerable is False:
+        return reward - 1.0
+    # False negative
+    if pred is False and is_vulnerable is True:
+        return reward - 0.5
+    # True negative
+    if pred is False and is_vulnerable is False:
+        return reward + 1.0
+    return reward

commitguard_env/scanner.py ADDED Viewed

	@@ -0,0 +1,54 @@

+from __future__ import annotations
+from typing import Any
+from .inference import format_prompt, generate, load_model
+from .models import ScanResult
+from .parse_action import parse_action
+class CommitGuardScanner:
+    """
+    Scanner for CommitGuard vulnerabilities.
+    Keeps the model in memory to allow fast scanning of multiple diffs.
+    """
+    def __init__(self, model_path: str = "inmodel-labs/commitguard-llama-3b", is_lora: bool = False, base_model: str = None) -> None:
+        self.model_path = model_path
+        self.is_lora = is_lora
+        self.base_model = base_model
+        self.model: Any = None
+        self.tokenizer: Any = None
+    def load(self) -> None:
+        """Load the model and tokenizer into memory."""
+        if self.model is None or self.tokenizer is None:
+            self.model, self.tokenizer = load_model(self.model_path, self.is_lora, self.base_model)
+    def scan(self, diff: str, available_files: list[str] = None) -> ScanResult:
+        """
+        Scan a given diff for vulnerabilities.
+        """
+        self.load()
+        prompt = format_prompt(diff, available_files)
+        response = generate(self.model, self.tokenizer, prompt)
+        action = parse_action(response)
+        # Map to ScanResult
+        return ScanResult(
+            is_vulnerable=action.is_vulnerable if action.is_vulnerable is not None else False,
+            cwe=action.vuln_type,
+            exploit_sketch=action.exploit_sketch,
+            raw_response=response,
+            parse_error=action.parse_error
+        )
+def scan(diff: str, model_path: str = "inmodel-labs/commitguard-llama-3b", is_lora: bool = False, base_model: str = None) -> ScanResult:
+    """
+    Convenience method to scan a single diff. Loads the model, scans, and returns the result.
+    If scanning multiple diffs, prefer instantiating CommitGuardScanner directly to avoid reloading the model.
+    """
+    scanner = CommitGuardScanner(model_path=model_path, is_lora=is_lora, base_model=base_model)
+    return scanner.scan(diff)

commitguard_env/server.py ADDED Viewed

	@@ -0,0 +1,127 @@

+from __future__ import annotations
+import logging
+import os
+import sys
+from pathlib import Path
+from typing import Any
+# Immediate flush logging for HF diagnosis
+def print_now(msg: str):
+    sys.stdout.write(f"DEBUG: {msg}\n")
+    sys.stdout.flush()
+print_now("Server process started, beginning imports...")
+import uvicorn
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from dataclasses import asdict
+from pydantic import BaseModel
+print_now("FastAPI imported.")
+from .environment import CommitGuardEnvironment
+from .parse_action import action_from_json, parse_action
+print_now("Local modules imported.")
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Configurable data path with fallback
+DATA_PATH_STR = os.environ.get("COMMITGUARD_DATA_PATH", "")
+if DATA_PATH_STR:
+    DATA_PATH = Path(DATA_PATH_STR)
+else:
+    # Match Docker path: /app/data/...
+    DATA_PATH = Path(__file__).resolve().parent.parent / "data" / "devign_filtered.jsonl"
+print_now(f"DATA_PATH resolved to: {DATA_PATH}")
+app = FastAPI(title="CommitGuard Env Server", version="0.1.0")
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=False,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+env = CommitGuardEnvironment(data_path=DATA_PATH)
+@app.on_event("startup")
+def startup_event():
+    print_now("FastAPI startup event triggered.")
+    logger.info(f"Loading data from {DATA_PATH}...")
+    try:
+        if not DATA_PATH.exists():
+            print_now(f"CRITICAL: Data path {DATA_PATH} DOES NOT EXIST")
+        env.load()
+        logger.info(f"Successfully loaded {len(env._samples)} samples.")
+        print_now(f"Loaded {len(env._samples)} samples.")
+    except Exception as e:
+        logger.error(f"FAILED to load data: {e}")
+        print_now(f"ERROR during load: {e}")
+class StepRequest(BaseModel):
+    action: str | None = None
+    action_type: str | None = None
+    file_path: str | None = None
+    reasoning: str | None = None
+    is_vulnerable: bool | None = None
+    vuln_type: str | None = None
+    exploit_sketch: str | None = None
+    episode_id: str | None = None
+@app.get("/health")
+def health() -> dict[str, str]:
+    return {"status": "healthy"}
+class ResetRequest(BaseModel):
+    sample_id: str | None = None
+@app.post("/reset")
+def reset(req: ResetRequest = ResetRequest()) -> dict[str, Any]:
+    try:
+        obs = env.reset(sample_id=req.sample_id)
+        return {
+            "observation": asdict(obs),
+            "done": False,
+            "reward": 0.0,
+        }
+    except ValueError as e:
+        return {"error": str(e)}
+@app.post("/step")
+def step(req: StepRequest) -> dict[str, Any]:
+    if req.action is not None:
+        action = parse_action(req.action)
+    else:
+        action = action_from_json(req.model_dump(exclude_none=True))
+    obs, reward, done = env.step(action, episode_id=req.episode_id)
+    return {
+        "observation": asdict(obs),
+        "done": done,
+        "reward": reward,
+        "info": {"parse_error": action.parse_error},
+    }
+@app.get("/state")
+def state(episode_id: str | None = None) -> dict[str, Any]:
+    st = env.state(episode_id=episode_id)
+    return {"state": asdict(st)}
+def main() -> None:
+    port = int(os.environ.get("PORT", 8000))
+    uvicorn.run("commitguard_env.server:app", host="0.0.0.0", port=port, reload=False)
+if __name__ == "__main__":
+    main()

configs/openenv.yaml ADDED Viewed

	@@ -0,0 +1,4 @@

+name: commitguard
+description: CommitGuard vulnerability detection environment
+version: 0.1.0
+entrypoint: commitguard_env.server:app

data/cwe_keywords.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "CWE-119": ["buffer overflow", "out of bounds", "overflow", "bounds check", "memcpy", "strcpy", "strcat", "index out of range", "heap", "stack smash"],
+  "CWE-476": ["null pointer", "nullptr", "dereference", "null check", "segmentation fault", "null access", "uninitialized"],
+  "CWE-189": ["integer overflow", "signedness", "division by zero", "arithmetic overflow", "wrap around", "truncation", "cast", "narrowing"],
+  "CWE-20": ["input validation", "improper input", "validation bypass", "sanitization", "untrusted input", "malformed data", "missing check"],
+  "CWE-22": ["path traversal", "directory traversal", "../", "..\\", "file inclusion", "arbitrary file", "escape root", "chroot"],
+  "CWE-78": ["command injection", "os.system", "subprocess", "shell=true", "exec(", "popen", "system(", "shell command"],
+  "CWE-89": ["sql injection", "sqli", "drop table", "union select", "query concatenation", "prepared statement", "bypass login"],
+  "CWE-79": ["xss", "cross site scripting", "script tag", "innerhtml", "alert(", "javascript:", "onerror", "content injection"],
+  "CWE-OTHER": ["vulnerability", "security", "exploit", "unsafe", "flaw", "bug", "error handling", "race condition", "use after free", "double free"]
+}

data/devign_filtered.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

data/devign_test.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

data/devign_train.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

docs/deployment.md ADDED Viewed

	@@ -0,0 +1,173 @@

+# 🚀 CommitGuard — Comprehensive GCP Deployment & Training Guide (A10G)
+This document is a deep-dive, step-by-step manual for deploying the CommitGuard environment and training pipeline to a Google Cloud Platform (GCP) instance. We are targeting an **NVIDIA A10G GPU** to execute **GRPO (Group Relative Policy Optimization)** on the Llama-3.2-3B model.
+---
+## 📋 1. Prerequisites: Setting Up Your Toolbox
+Before you touch the cloud, you must ensure your local environment and external accounts are configured. These are the building blocks of the entire run.
+### A. GCP Account & Project Setup
+*   **Active Project:** You must have a GCP project created. Note your `PROJECT_ID`.
+*   **GPU Quota:** By default, GCP projects have 0 quota for GPUs. You must navigate to `IAM & Admin > Quotas` and request a limit increase for `NVIDIA_A10G_GPUS` in your desired region (e.g., `us-central1`). **Do this 24 hours in advance.**
+### B. Weights & Biases (WandB) for Visualization
+*   **Why?** RL training can be unstable. WandB allows you to monitor the "Reward" and "KL Divergence" curves in real-time from your browser.
+*   **Action:** Create a free account at [wandb.ai](https://wandb.ai), navigate to your settings, and copy your **API Key**.
+### C. Hugging Face Account & Llama Access
+*   **Model Gating:** Llama-3.2-3B is a gated model. You must visit the [model page](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and apply for access. Approval usually takes 30-60 minutes.
+*   **Access Token:** Generate a "Write" token in your Hugging Face settings to allow the VM to download the model and upload your finished adapters.
+### D. Local gcloud CLI Initialization
+*   **Installation:** Install the Google Cloud SDK on your laptop.
+*   **Authentication:** Run `gcloud auth login` and `gcloud config set project [YOUR_PROJECT_ID]`. This allows your local terminal to "talk" to GCP.
+---
+## 🛠 Step 1: Provisioning the High-Performance VM
+We are using the **G2 Standard 4** machine. It is specifically designed for AI workloads.
+### Detailed Breakdown of the Creation Command
+*   **`--machine-type g2-standard-4`:** Provides 4 vCPUs and 16GB of system RAM, ensuring the CPU doesn't bottleneck the GPU.
+*   **`--accelerator type=nvidia-a10g,count=1`:** Attaches the A10G GPU. Its 24GB of VRAM is the "Goldilocks" zone for 3B parameter models—enough to handle the model plus the multiple "generations" required by the GRPO algorithm.
+*   **`--image-family common-cu121`:** Uses a specialized Google image that comes with **CUDA 12.1 and NVIDIA drivers pre-installed**. This saves you 30 minutes of manual driver installation.
+*   **`--provisioning-model=SPOT`:** **CRITICAL FOR BUDGET.** Spot instances use excess capacity and are ~70% cheaper than standard instances. If the instance is reclaimed by Google, your 50-step checkpoints ensure you don't lose much progress.
+```bash
+gcloud compute instances create commitguard-trainer \
+    --project=[PROJECT_ID] \
+    --zone=us-central1-a \
+    --machine-type=g2-standard-4 \
+    --accelerator=count=1,type=nvidia-a10g \
+    --image-project=ml-images \
+    --image-family=common-cu121 \
+    --boot-disk-size=100GB \
+    --boot-disk-type=pd-balanced \
+    --maintenance-policy=TERMINATE \
+    --provisioning-model=SPOT
+```
+---
+## 🏗 Step 2: Environment Preparation
+Once the VM is "Running," we need to turn it into a specialized CommitGuard lab.
+### A. Secure Connection (SSH)
+Connect to the machine's terminal:
+```bash
+gcloud compute ssh commitguard-trainer --zone=us-central1-a
+```
+### B. Repository & Virtual Environment
+We isolate our dependencies to prevent conflicts with system-level Python packages.
+```bash
+# Clone the project
+git clone https://github.com/[YOUR_USER]/commitguard.git
+cd commitguard
+# Create a 'venv' (Virtual Environment)
+python3 -m venv .venv
+source .venv/bin/activate
+# Authenticate with Hugging Face (Required for gated Llama models)
+huggingface-cli login
+```
+### C. Installing the "Train" Stack
+The `-e ".[train]"` command installs the `commitguard` package in "editable" mode along with all optional training libraries like `torch`, `peft`, and `trl`.
+```bash
+pip install -U pip
+pip install -e ".[train]"
+# Flash Attention 2: This is a specialized kernel that makes Llama training
+# significantly faster and more memory-efficient on A10G hardware.
+pip install flash-attn --no-build-isolation
+```
+---
+## 📡 Step 3: Launching the Verifiable Reward Server
+CommitGuard uses **RLVR**. In this setup, the model doesn't just "guess" if it's right; it submits an action to a server that calculates a reward based on hard evidence.
+### Running in the Background
+Since training takes hours, we run the server in the background using the `&` symbol.
+```bash
+# Start the server
+python -m commitguard_env.server &
+# Verify Health: This ensures the database and API are ready.
+# If this fails, the trainer will hang indefinitely.
+curl http://localhost:8000/health
+# You should see: {"status":"healthy"}
+```
+---
+## 🧠 Step 4: Executing the GRPO Training Run
+GRPO is a "reinforcement learning" algorithm. It asks the model to generate 4 different answers for the same code diff, compares them to each other, and rewards the ones that follow the XML format and correctly identify the vulnerability.
+### Hyperparameter Explanation
+*   **`--steps 500`:** The model will see roughly 2,000 examples (4 generations x 500 steps).
+*   **`4-bit Quantization`:** Automatically handled by the script. It "compresses" the model weights so they fit into the GPU's memory without losing accuracy.
+*   **`LoRA r=8`:** "Low-Rank Adaptation." Instead of training 3 billion parameters, we only train about 5 million. This makes training stable and fast.
+*   **`--live`:** Tells the script to fetch rewards from the server we started in Step 3.
+```bash
+# Login to WandB so your graphs show up online
+export WANDB_API_KEY=[YOUR_WANDB_KEY]
+python scripts/train_grpo.py \
+    --model_name "meta-llama/Llama-3.2-3B-Instruct" \
+    --output_dir "./outputs/commitguard-final" \
+    --steps 500 \
+    --live \
+    --wandb "commitguard-rlvr"
+```
+---
+## 💾 Step 5: Post-Run Weight Management & Cleanup
+Once the 500 steps are complete, the "brain" of your agent exists as a LoRA adapter in the `./outputs` folder.
+### A. Permanent Storage (Hugging Face)
+The VM's disk is temporary. Move your weights to Hugging Face immediately.
+```bash
+huggingface-cli login --token [YOUR_HF_TOKEN]
+huggingface-cli upload [HF_USERNAME]/commitguard-llama3b-adapter ./outputs/commitguard-final
+```
+### B. Cost Control: Deleting the VM
+**DO NOT FORGET THIS STEP.** An idle A10G instance costs money every hour.
+```bash
+# Exit the VM
+exit
+# Delete from your local terminal
+gcloud compute instances delete commitguard-trainer --zone=us-central1-a
+```
+---
+## 🆘 Critical Troubleshooting
+### "CUDA Out of Memory"
+*   **Symptom:** Training crashes with a long error ending in `OutOfMemoryError`.
+*   **Fix:** The "Group" in GRPO is currently set to 4 generations. Open `scripts/train_grpo.py` and change `num_generations=4` to `num_generations=2`. This cuts memory usage in half.
+### "Connection Refused"
+*   **Symptom:** Reward function returns -1.0 for everything or throws errors.
+*   **Fix:** Your environment server crashed or wasn't started. Run `ps aux | grep server` to check if it is still running.
+### The "Midnight Fallback"
+If the 3B model is too slow for the submission deadline:
+*   Switch to the **1.5B Qwen** model. It uses the same XML format but is 2x faster.
+*   Command: `python scripts/train_grpo.py --model_name "Qwen/Qwen2.5-1.5B-Instruct" ...`
+---
+## ✅ Final Success Checklist
+1. [ ] **Health Check:** `curl` returns healthy.
+2. [ ] **WandB Tracking:** You can see the `reward` curve moving on the website.
+3. [ ] **Checkpoints:** You see folders like `checkpoint-50`, `checkpoint-100` in the output directory.
+4. [ ] **Clean Exit:** The VM is deleted after the adapter is uploaded to Hugging Face.

docs/hybrid_workflow.md ADDED Viewed

	@@ -0,0 +1,108 @@

+# 🔗 CommitGuard — Server-to-Plugin Hybrid Workflow
+This document details the end-to-end integration of the **CommitGuard Gymnasium** (hosted on Hugging Face) with the **Developer Plugin** (running locally). This setup realizes the project's core vision: **Commit-Time Security at AI Speed.**
+---
+## 🏗 Stage 1: Deploying the Gymnasium (The Server)
+The "Gymnasium" is the Meta OpenEnv server. It hosts the code diffs, tracks the multi-step agent state, and calculates the RLVR rewards.
+### 1.1 Local Preparation
+Ensure your `openenv.yaml` is configured with the correct name and metadata.
+```yaml
+# openenv.yaml
+name: commitguard
+version: 0.1.0
+entrypoint: server
+```
+### 1.2 Push to Hugging Face Spaces
+Use the `openenv` CLI to bundle the project into a Docker container and upload it to a Space.
+```bash
+# Login to Hugging Face
+huggingface-cli login
+# Push the environment
+# Replace [USER] with your HF username
+openenv push --space [USER]/commitguard-gym
+```
+### 1.3 Verification
+Once the Space build is complete:
+1.  Open the Space in your browser. You should see the **OpenEnv Gymnasium UI**.
+2.  Test the `/health` endpoint:
+    `curl https://[USER]-commitguard-gym.hf.space/health`
+    *Expected:* `{"status": "healthy"}`
+---
+## 🧠 Stage 2: Connecting the Trained Model
+Your trained Llama-3.2-3B model (or its LoRA adapter) needs to know where to "play."
+### 2.1 Configuration
+Update your local environment or training script to point to the live HF Space instead of `localhost`.
+```bash
+export COMMITGUARD_ENV_URL="https://[USER]-commitguard-gym.hf.space"
+```
+### 2.2 Model Inference Hook
+The model takes the local code diff as input and emits an XML action.
+- **CLI Mode:** `python scripts/evaluate.py --env_url $COMMITGUARD_ENV_URL`
+- **Plugin Mode:** The plugin script captures the diff and calls the model.
+---
+## 🛠 Stage 3: Setting up the Developer Plugin (Git Hook)
+We will implement a local Git `pre-commit` hook that invokes the model and consults the HF Gymnasium for a verdict.
+### 3.1 Create the Hook Script
+Save this as `.git/hooks/pre-commit` and make it executable (`chmod +x`).
+```bash
+#!/bin/bash
+# 1. Capture the staged diff
+DIFF=$(git diff --cached)
+# 2. Invoke the CommitGuard Agent
+# This script sends the diff to your model (running locally or via API)
+# which then interacts with the HF Gymnasium Space.
+VERDICT_JSON=$(python scripts/evaluate_single_diff.py --diff "$DIFF")
+# 3. Parse the Verdict
+IS_VULNERABLE=$(echo $VERDICT_JSON | jq -r '.is_vulnerable')
+REASONING=$(echo $VERDICT_JSON | jq -r '.reasoning')
+# 4. Block or Allow
+if [ "$IS_VULNERABLE" == "true" ]; then
+    echo "❌ [CommitGuard] VULNERABILITY DETECTED"
+    echo "Reasoning: $REASONING"
+    echo "Commit blocked. Please fix the security issue and try again."
+    exit 1
+else
+    echo "✅ [CommitGuard] No vulnerabilities detected. Proceeding..."
+    exit 0
+fi
+```
+---
+## 🔄 Stage 4: The End-to-End Execution Cycle
+1.  **Developer writes code:** E.g., adding an unsanitized SQL query to `db.py`.
+2.  **Developer runs `git commit`:** The pre-commit hook triggers.
+3.  **The Plugin acts:**
+    - It sends the diff to the **Hugging Face Gymnasium**.
+    - The Gymnasium generates an **Observation** (diff + available files).
+    - The **Trained Model** processes the observation and generates a **Verdict** action.
+    - The Gymnasium calculates the **Reward/Verdict** based on ground truth.
+4.  **The Verdict returns:** The hook receives `is_vulnerable: true`.
+5.  **The Commit is blocked:** The developer sees the exploit sketch in their terminal and the code never hits the repo.
+---
+## 🎥 Demonstration Tips for Judges
+For the demo video, use a **split-screen** view:
+- **Left Side:** The Hugging Face Space UI showing the "Gymnasium" state updating.
+- **Right Side:** Your local Terminal showing the `git commit` being blocked by the plugin.
+- **Outcome:** This proves that your RL agent has learned to use the Gymnasium's verifiable rewards to protect a real developer workflow.

docs/prd.md ADDED Viewed

	@@ -0,0 +1,381 @@

+# CommitGuard  Product Requirements Document
+**Project:** CommitGuard
+**Owner:** Niti (Inmodel Labs)
+**Team:** Niti, Deepak, Divyank
+**Submission deadline:** Sunday 5:00 PM IST
+**Hackathon:** Meta OpenEnv Hackathon (PyTorch + Hugging Face + Scaler)
+**Document status:** Locked. Scope freeze at midnight Saturday.
+---
+## 1. Executive Summary
+CommitGuard is a Reinforcement Learning environment built on Meta OpenEnv that trains LLM agents to detect exploitable vulnerabilities in code commits. The submission demonstrates that AI-paced security review is feasible  that an agent trained on commit-level reasoning can match the velocity at which AI coding agents are now shipping production code.
+The deliverable is a runnable HF Space hosting the env, a training notebook that produces a measurable learning curve on Llama-3.2-3B-Instruct, a demo video showing the qualitative shift from untrained to trained behavior, and a README that tells the story.
+---
+## 2. Problem Statement
+### 2.1 The shift in software development
+Until recently, code was written by humans at human velocity. Security review processes were designed around this assumption  periodic pentests every 3 to 6 months, with manual code review at PR time. The cycle worked because the codebase changed slowly enough that periodic deep review caught most issues before they reached production.
+This assumption has broken. Code is now being written and shipped by AI coding agents  Claude Code, Cursor, autonomous coding agents  at 10 to 100 times human velocity. Companies push to production daily, sometimes hourly. A pentest report from six months ago describes a codebase that no longer exists.
+### 2.2 The asymmetry
+The same class of LLM that writes the code can be weaponized to attack it. An adversary equipped with autonomous coding tooling, given repository access or even just leaked commits, can pentest at the same velocity defenders ship. Defense runs on human time. Offense runs on AI time. **This asymmetry is unsustainable for any organization shipping AI-generated code at scale.**
+### 2.3 Why this is a frontier problem
+AI red-teaming today is overwhelmingly a manual, human-bottlenecked discipline. Researchers at Anthropic, OpenAI, and Meta craft attacks one at a time. There is no automated equivalent of Metasploit for AI-generated code. Closing that gap is an open research problem that frontier labs are actively investing in.
+---
+## 3. Goals and Non-Goals
+### 3.1 Goals (in scope for this submission)
+- Deliver a working OpenEnv environment that takes a code commit as input and rewards an agent for correctly identifying vulnerabilities, the CWE class, and a plausible exploit
+- Train a small Llama variant (Llama-3.2-3B-Instruct) on the env using GRPO via TRL + Unsloth
+- Demonstrate measurable learning  baseline vs. trained accuracy with reward curves
+- Ship a complete submission package: HF Space, training notebook, README, demo video, optional HF blog post
+- Frame the work in language a Meta researcher recognizes: RLVR (Reinforcement Learning from Verifiable Rewards), commit-time security, AI-paced defense
+### 3.2 Non-goals (explicitly out of scope)
+- Production-ready security tool  this is a research environment, not a CI plugin
+- Real-time exploit execution against arbitrary code  the v1 reward uses pattern matching, not sandboxed execution
+- Multi-file / repo-level reasoning  v1 operates on single-file commits up to 80 lines
+- Multi-agent self-play  listed in Future Work
+- Pentesting beyond static code analysis  no network attacks, social engineering, or runtime probing
+- Coverage of all CWEs  v1 focuses on the top 10 CWEs in Devign
+### 3.3 Non-goals from the rubric perspective
+The rubric rewards ambition and storytelling more heavily than engineering polish. Therefore: not pursuing exhaustive test coverage, not optimizing for inference latency, not building a fancy frontend. The HF Space's default web UI is sufficient.
+---
+## 4. Target Users and Stakeholders
+| Stakeholder | Role | What they care about |
+|---|---|---|
+| Hackathon judges (Meta partner engineers) | Primary audience | Innovation, story, training evidence, reward design |
+| Meta Superintelligence Labs researchers | Aspirational audience | Frontier framing, RLVR alignment, paper-worthiness |
+| HF community | Discovery audience | Reproducibility, runnable Space, clean README |
+| Future contributors | Builder audience | Code clarity, extensibility hooks for v2 |
+---
+## 5. Solution Overview
+### 5.1 The environment
+CommitGuard is an OpenEnv environment where an agent investigates code commits and decides whether they introduce exploitable vulnerabilities. The agent has limited investigation budget (5 steps maximum per episode), forcing it to reason efficiently rather than brute-forcing context.
+### 5.2 The agent loop
+1. `reset()`  env loads a commit (a `code_before`/`code_after` pair plus metadata) from a preprocessed Devign-derived dataset, returns the diff and the list of available files in the repo
+2. `step(action)`  agent emits one of three action types:
+   - `request_context(file_path)`  pull surrounding code (small reward penalty, encourages efficiency)
+   - `analyze(reasoning)`  write chain-of-thought, no reward effect, logged for traces
+   - `verdict(is_vulnerable, vuln_type, exploit_sketch)`  terminate the episode with a judgment
+3. Reward fires on verdict, computed server-side against ground truth the agent never sees
+### 5.3 Reward design (RLVR philosophy)
+The reward is tiered and grounded in dataset truth, not in another LLM's opinion. This is deliberate  it follows the RLVR tradition (verifiable rewards from ground truth or executable checks) and prevents the reward hacking that plagues LLM-as-judge setups.
+| Signal | Reward |
+|---|---|
+| Correct binary verdict (vulnerable vs. safe) | +1.0 |
+| Correct CWE classification (when vulnerable) | +0.5 |
+| Plausible exploit sketch (CWE-keyword match) | +0.5 |
+| False positive (safe flagged as vulnerable) | -1.0 |
+| False negative (real vuln missed) | -0.5 |
+| Per-step context request | -0.05 |
+| Episode step cap | 5 steps |
+The shape is hard to game  flagging everything is punished by false positives, never investigating means no exploit sketch bonus.
+---
+## 6. Technical Architecture
+### 6.1 System diagram
+```
+     HTTP/JSON
+   TRL + Unsloth           HF Space
+   Llama-3.2-3B         reset/step         FastAPI server
+   GRPO trainer         /state             (Docker)
+   (HF Jobs A10G)
+                                                Devign
+                                                JSONL
+                                                Reward
+                                                function
+```
+### 6.2 Component breakdown
+**Env server** (Python, FastAPI, Docker, OpenEnv 0.2.3+)
+- `models.py`  Action, Observation, State dataclasses (extends OpenEnv base classes)
+- `environment.py`  `reset()`, `step()`, `state()` methods on the `CommitGuardEnvironment` class
+- `reward.py`  pure function `compute_reward(action, ground_truth, cwe_keywords) -> float`
+- `parse_action.py`  XML-tag parser, robust to malformed model output
+- `data/devign_filtered.jsonl`  preprocessed dataset, shipped in image
+- `data/cwe_keywords.json`  top-10 CWE  exploit-pattern keyword map
+**Env client** (auto-generated by OpenEnv CLI)
+- `client.py`  `HTTPEnvClient` subclass, used by training notebook
+- Installable via `pip install git+https://huggingface.co/spaces/<user>/commitguard`
+**Training pipeline** (Python, TRL, Unsloth, PEFT, Wandb)
+- `train_grpo.py`  GRPOTrainer config + main loop
+- `agent_prompt.py`  system prompt template with XML-tag action format
+- `evaluate.py`  runs N samples through a model, returns accuracy stats
+**Storytelling artifacts**
+- `README.md`  pitch + results + links
+- `demo_video.mp4`  60-90 second before/after, hosted on YouTube unlisted
+- `commitguard_hf_blog.md`  optional HF Hub blog post (page 26 bonus)
+- `plots/`  reward_curve.png, baseline_vs_trained.png, per_cwe.png
+### 6.3 Data flow
+1. Preprocess Devign once at build time  `data/devign_filtered.jsonl` (~5000 samples, balanced, filtered to <80 LOC)
+2. Build Docker image with JSONL embedded
+3. `openenv push` deploys to HF Space
+4. Training notebook connects to HF Space URL via the OpenEnv HTTP client
+5. Each training step: GRPO generates 4 completions per prompt  each runs a full episode in the env  rewards collected  policy updated via LoRA
+6. Wandb logs reward curves, training loss, checkpoints saved every 50 steps
+7. Final LoRA adapter saved to HF Hub for evaluation and demo
+### 6.4 Cheating prevention
+The agent must never see ground truth. Enforced by architecture:
+- Ground truth lives only on the server, in the JSONL file the env loads from
+- The Observation dataclass schema explicitly excludes `is_vulnerable`, `cwe_type`, and `target_file_with_label`
+- A unit test (`test_no_leak.py`) asserts no observation contains forbidden fields
+- The server returns only `reward` (a scalar) on each step, never the label that produced it
+---
+## 7. Stack and Dependencies
+### 7.1 Locked technical decisions
+| Decision | Choice | Rationale |
+|---|---|---|
+| Env framework | Meta OpenEnv 0.2.3+ | Mandatory per submission rules |
+| Server runtime | FastAPI in Docker | OpenEnv default, lowest friction |
+| Hosting | HF Space | Mandatory per submission rules, three-in-one (server + repo + registry) |
+| Data source | Devign (DetectBERT subset) | Already on disk, real CWE labels, manageable size |
+| Model | Llama-3.2-3B-Instruct | Meta-branded for the Meta hackathon, fits A10G with GRPO |
+| Training framework | TRL with GRPO | Native OpenEnv integration via `reward_funcs` callback |
+| Training optimization | Unsloth 4-bit + LoRA r=8 | 70% memory reduction, 2x speed (page 75 of opening deck) |
+| Training infra | HF Jobs A10G | $0.40-1.50/hr, runs unattended, integrates with HF ecosystem |
+| Dev infra | GCP VM with T4 | Stable, no Colab disconnects, leverages 24,000 GCP credit |
+| Action serialization | XML-tag free-text | Robust to small-model output variance, easier than JSON-mode |
+| Logging | Wandb | TRL native, judges can view runs |
+### 7.2 Fallback decisions (pre-approved, no debate when triggered)
+| If this fails | Fall back to | Trigger |
+|---|---|---|
+| Llama-3.2-3B OOM on A10G | Qwen2.5-1.5B-Instruct | First test step crashes |
+| HF Jobs queue full | GCP A10G on-demand | Job queues for >30 min |
+| 3-action env doesn't ship by midnight | 2-action env (analyze + verdict) | Niti's checkpoint red |
+| Tiered reward buggy | Binary correct/incorrect reward | Deepak's checkpoint red |
+| Training curve flat | Ship with qualitative comparison only | Curve still flat at 10 AM Sunday |
+| Demo video can't be cleanly recorded | Side-by-side text trace in README | Recording fails twice |
+---
+## 8. Functional Requirements
+### 8.1 Environment functional requirements
+| ID | Requirement | Priority |
+|---|---|---|
+| F-1 | Env exposes `/health`, `/reset`, `/step`, `/state`, `/docs` endpoints | P0 |
+| F-2 | `reset()` returns a random commit observation, never the same one twice in a single episode | P0 |
+| F-3 | `step()` accepts XML-tagged action strings and parses them robustly | P0 |
+| F-4 | `step()` returns reward, observation, and done flag | P0 |
+| F-5 | Episode terminates on `verdict` action OR after 5 steps | P0 |
+| F-6 | Observation never contains ground-truth labels | P0 |
+| F-7 | Env handles malformed actions gracefully (returns -0.5 reward, doesn't crash) | P1 |
+| F-8 | Env supports concurrent episodes (multiple training generations in parallel) | P1 |
+| F-9 | Web UI on HF Space allows manual interaction for demo recording | P2 |
+### 8.2 Training functional requirements
+| ID | Requirement | Priority |
+|---|---|---|
+| T-1 | Training notebook runs end-to-end on a single A10G | P0 |
+| T-2 | Reward curve, training loss, and completions logged to Wandb | P0 |
+| T-3 | LoRA adapter saved every 50 steps for resumability | P0 |
+| T-4 | Baseline (untrained) evaluation on 100 held-out samples completes in <10 min | P0 |
+| T-5 | Trained model evaluation produces per-CWE accuracy breakdown | P1 |
+| T-6 | Notebook runnable from Colab via "Open in Colab" badge in README | P1 |
+### 8.3 Storytelling functional requirements
+| ID | Requirement | Priority |
+|---|---|---|
+| S-1 | README explains problem, env, results, and motivation in <5 min read | P0 |
+| S-2 | All plot PNGs committed to repo (not Wandb-only) | P0 |
+| S-3 | Demo video 60-90 sec, before/after on a single SQL injection example | P0 |
+| S-4 | Wandb run URL linked in README | P1 |
+| S-5 | HF Hub blog post published and linked | P2 |
+---
+## 9. Non-Functional Requirements
+| Aspect | Requirement |
+|---|---|
+| Performance | Single `step()` call returns in <2 seconds on HF Space free tier |
+| Reliability | Env survives 100 random episodes without crash |
+| Reproducibility | Training notebook produces a measurable learning curve when re-run with same seed |
+| Discoverability | HF Space tagged with `openenv`, `rl`, `security`, `code` |
+| Documentation | README is self-contained  judge can understand without reading source |
+| Licensing | Code MIT-licensed, dataset attribution to Devign authors |
+---
+## 10. Success Metrics
+### 10.1 Submission completeness (binary, must-pass)
+- [ ] HF Space deployed and `/health` returns 200 OK
+- [ ] Training notebook runs without crashes on a fresh Colab/VM
+- [ ] README has all required links (HF Space, notebook, video, GitHub)
+- [ ] At least one reward curve plot committed
+- [ ] Demo video accessible via public URL
+### 10.2 Quality metrics (graded by rubric)
+| Metric | Target | Stretch |
+|---|---|---|
+| Innovation framing recognized by mentor | "this is an interesting angle" feedback | "this is paper-worthy" feedback |
+| Baseline accuracy (untrained Llama-3.2-3B) | Establishes a floor (likely 30-45%) |  |
+| Trained accuracy (after 300 GRPO steps) | Beats baseline by 10pp absolute | Beats baseline by 20pp |
+| Reward curve | Bends upward visibly | Smooth monotonic increase |
+| Per-CWE breakdown | At least 3 CWEs show improvement | All top-5 CWEs show improvement |
+| Storytelling | Mentor at Round 3 can repeat the pitch back | Mentor offers to share with Meta team |
+### 10.3 Anti-metrics (things we explicitly don't optimize for)
+- Number of features
+- Number of CWEs covered (more is not better  depth beats breadth here)
+- Lines of code
+- Model size (going larger doesn't make a stronger submission, just slower training)
+---
+## 11. Risks and Mitigations
+| Risk | Likelihood | Impact | Mitigation |
+|---|---|---|---|
+| Training run produces flat curve | Medium | High | Pre-approved pivot to qualitative-comparison narrative; baseline already establishes a contrast |
+| HF Space deployment fails at 4 AM | Low | High | Fallback to Docker image with `docker run` instructions in README |
+| Llama-3.2 license approval delayed | Low | Medium | Submit license request immediately at GCP setup; Qwen-1.5B fallback ready |
+| Devign data has bad CWE labels | Medium | Medium | Filter aggressively; if too noisy, drop to top-5 cleanest CWEs only |
+| One teammate falls behind their phase | Medium | High | Sync points at midnight, 9 AM, 3 PM allow scope cuts; mock-env pattern means training isn't blocked |
+| Niti exhausted at Mentor Round 3 | High if no sleep | High | Mandatory sleep schedule 12:30 AM5:00 AM, non-negotiable |
+| Demo video can't be cleanly recorded | Medium | Medium | Cherry-pick the best example; fall back to text trace if recording fails twice |
+| HF Space rate limits during training | Low | Medium | Run training on local Docker if HF Space hits limits |
+---
+## 12. Timeline and Milestones
+| Time (IST) | Milestone | Owner |
+|---|---|---|
+| Sat 9:30 PM | Phase 1 starts  env scaffolding, data prep, training scaffolding in parallel | All |
+| Sat 8:00 PM | Mentor Round 2  pitch validation | Niti |
+| Sat 11:59 PM | Phase 1 checkpoint  env runs, data ready, mock training works | All |
+| Sun 12:00 AM | **Scope freeze**  no new features after this point | All |
+| Sun 12:30 AM | Niti sleep starts | Niti |
+| Sun 3:00 AM | HF Space live, Deepak sleep starts | Deepak |
+| Sun 5:30 AM | Real training run launched on HF Jobs, Divyank sleep starts | Divyank |
+| Sun 5:00 AM | Niti wakes, watches training | Niti |
+| Sun 9:00 AM | Team sync  training results, plot status | All |
+| Sun 10:00 AM | Mentor Round 3  final sharpening | Niti |
+| Sun 11:30 AM | Demo video recorded and uploaded | Divyank |
+| Sun 1:00 PM | README finalized | Niti |
+| Sun 3:00 PM | **Feature freeze**  2-hour reminder, no more changes | All |
+| Sun 4:30 PM | Submission packaged | Niti |
+| Sun 5:00 PM | **Submission deadline** |  |
+---
+## 13. Open Questions and Assumptions
+### 13.1 Assumptions
+- Devign dataset is on disk locally (or downloadable in <30 min)  to be verified by Deepak at Phase 1 start
+- HF Space free tier is sufficient for env hosting during the hackathon  backup plan: $9/mo upgrade if rate limited
+- Llama-3.2-3B-Instruct license approval lands within 1 hour of request  Qwen fallback ready if not
+- HF Jobs A10G availability at 5 AM Sunday  GCP A10G fallback if queued
+### 13.2 Open questions (to resolve during execution)
+- Exact number of training steps to maximize curve visibility within budget  answered empirically by 9 AM Sunday based on observed loss
+- Whether to ship a Colab-runnable notebook AND an HF Jobs notebook, or just one  defer to Divyank's call at Phase 2
+- Whether to include a comparison against a non-RL baseline (pure SFT or zero-shot)  stretch only
+---
+## 14. Future Work (Post-Hackathon)
+This section becomes part of the README's "What's Next" pitch  explicitly signals to judges that we understand the limitations and have a roadmap.
+- **Sandboxed exploit execution**  replace pattern-match reward with actual exploit runs against compiled code in a Docker sandbox
+- **Multi-file commit reasoning**  extend the env to support diffs spanning multiple files, with a context budget
+- **Self-play loop**  pair CommitGuard with a code-generation agent; defender and attacker train against each other (the AlphaGo pattern for security)
+- **Agentic harness integration**  wire into real CI pipelines via the OpenEnv MCP layer, enabling commit-time security review at PR open
+- **Real CVE corpus**  extend beyond Devign to recent CVE-tagged commits from major open-source repos
+- **Multi-language support**  current env is C-focused via Devign; extend to Python, JavaScript, Go
+- **Reward shape ablations**  formal study of how reward composition affects which vulnerability types the model learns fastest
+---
+## 15. Appendix
+### 15.1 Key reference URLs (for the team to bookmark)
+- OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
+- OpenEnv Scaler intro: https://tinyurl.com/openenv-scaler
+- TRL OpenEnv docs: https://huggingface.co/docs/trl/en/openenv
+- TRL Sudoku GRPO example: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb
+- TRL Wordle GRPO example: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynb
+- Unsloth 2048 example: https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/examples/unsloth_2048.ipynb
+- Llama-3.2-3B model card: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
+- HF Jobs docs: https://huggingface.co/docs/hub/jobs
+- Cursor credits: https://tinyurl.com/sclr-openenv-dashboard
+- HF $30 credits: https://huggingface.co/coupons/claim/hf-openenv-community
+### 15.2 Document version
+- v1.0  Saturday evening, Bangalore venue. Locked at midnight Saturday.
+- Changes after lock require explicit team-wide sign-off and a documented rationale.
+---
+## 16. The 30-Second Pitch (For Mentor Rounds, Memorize This)
+> "AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it  defense is on human time, offense is on AI time, and that asymmetry breaks the security model.
+>
+> CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR  verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."

docs/testprojects.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# 🧪 CommitGuard — Test Projects & Penetration Testing Targets
+This document serves as a catalog of vulnerable applications and datasets used to benchmark, train, and penetration-test the CommitGuard agent. These projects provide the ground-truth "exploit targets" required to verify the agent's detection accuracy across various CWE categories.
+---
+## Tier 1 — Purpose-Built Vulnerable Python Apps
+*Best for: Controlled training, unit testing, and reward model validation.*
+These projects are intentionally designed with security loopholes, making them ideal for verifying that CommitGuard’s reward model correctly identifies specific CWEs.
+### 1. Checkmarx c{api}tal
+- **GitHub:** [Checkmarx/capital](https://github.com/Checkmarx/capital)
+- **Tech Stack:** FastAPI, Pydantic, Alembic, React.
+- **Vulnerabilities:** 10 challenges mapping to OWASP Top 10 API risks.
+- **CWEs Covered:** Broken Object Level Auth (BOLA), Mass Assignment, Broken Authentication, SSRF.
+- **CommitGuard Fit:** Matches the modern Python backend stack (FastAPI + Pydantic) that the agent is optimized to audit.
+### 2. vulnpy by Contrast Security
+- **GitHub:** [Contrast-Security-OSS/vulnpy](https://github.com/Contrast-Security-OSS/vulnpy)
+- **Tech Stack:** FastAPI, Flask, Django support.
+- **Vulnerabilities:** Purposely-vulnerable functions that can be mounted as routes.
+- **CWEs Covered:** SQLi, Path Traversal, Command Injection, XSS, SSRF, Deserialization.
+- **CommitGuard Fit:** Isolated, clean diff-level code units—perfect for the granularity of the RL environment.
+### 3. OWASP BenchmarkPython
+- **GitHub:** [OWASP-Benchmark/BenchmarkPython](https://github.com/OWASP-Benchmark/BenchmarkPython)
+- **Purpose:** Verifying SAST/DAST/IAST accuracy.
+- **CommitGuard Fit:** Provides a standardized scorecard to measure CommitGuard's accuracy against established tools like Bandit or ZAP.
+### 4. python-insecure-app by trottomv
+- **GitHub:** [trottomv/python-insecure-app](https://github.com/trottomv/python-insecure-app)
+- **CWEs Covered:** CWE-798 (Hardcoded Credentials), CWE-94 (SSTI/Code Injection), CWE-937 (Vulnerable Dependencies).
+- **CommitGuard Fit:** Demonstrates "shift-left" security by targeting insecure dependencies and secrets in FastAPI.
+### 5. Intentionally-Vulnerable-Python-Application
+- **GitHub:** [mukxl/Intentionally-Vulnerable-Python-Application](https://github.com/mukxl/Intentionally-Vulnerable-Python-Application)
+- **Purpose:** Designed for SCA, SAST, and DAST analysis.
+- **CommitGuard Fit:** Excellent regression test target to confirm the agent catches what conventional scanners catch—and identifies what they miss.
+### 6. Vulnerable-API by michealkeines
+- **GitHub:** [michealkeines/Vulnerable-API](https://github.com/michealkeines/Vulnerable-API)
+- **Tech Stack:** Flask, Jinja, SQLite3.
+- **CWEs Covered:** CWE-89 (SQLi), CWE-79 (XSS), CWE-73 (LFI/RFI), CWE-94 (SSTI).
+- **CommitGuard Fit:** Specifically designed to test automated API scanners with injection-heavy payloads.
+---
+## Tier 2 — Real-World Projects with Known CVEs
+*Best for: Advanced agent training and "In-the-Wild" performance testing.*
+These production-grade projects provide non-synthetic, complex commit diffs derived from the National Vulnerability Database (NVD).
+| Project | Stack | CWE / Vulnerability Type | Relevance |
+| :--- | :--- | :--- | :--- |
+| **Django** | Django + ORM | SQL Injection (CWE-89), Open Redirect (CWE-601), ReDoS (CWE-400) | High-signal real-world diffs |
+| **Pillow** | Python Imaging | Buffer overflows, Path Traversal, ACE | Tests non-web CWE detection |
+| **Requests** | Python HTTP | SSRF, Header Injection | Header-level vuln detection |
+| **Paramiko** | SSH/Crypto | Auth bypass (CVE-2018-7750), Weak crypto | Crypto CWE training data |
+| **PyYAML** | Config Parsing | Deserialization ACE (CWE-502) | Classic commit-diff CWE |
+**Recommended Dataset:** [CVEFixes (ZeoVan/CVEfixes)](https://github.com/ZeoVan/CVEfixes)
+A multi-language dataset providing the exact fixing commits for vulnerabilities, annotated at function and file levels. Directly usable for CommitGuard's diff-based pipeline.
+---
+## Tier 3 — Pydantic & Type-Safety Specific Targets
+*Best for: Specialized auditing of type-driven Python applications.*
+1. **Pydantic v1 Model Misuse Patterns:** Common vulnerabilities involving `model.dict()`, validator skipping, or internal field exposure. The **Checkmarx c{api}tal** project is the primary reference for this.
+2. **Type-Safety "Escape Hatches":** Targets projects with complex type annotations, union types, and `Any` usage where type-safety bugs often hide.
+   - **Targets:** Django REST Framework and FastAPI projects with permissive `Optional` fields or loose Pydantic validation.
+---
+## 📈 Benchmarking Strategy
+To verify CommitGuard's performance on these projects:
+1. **Extraction:** Use `scratch/extract_sample.py` to pull vulnerable diffs from the projects above.
+2. **Evaluation:** Run `scripts/evaluate.py` using these samples as the test set.
+3. **Comparison:** Compare results against the `eval_baseline.json` to visualize the delta in detection capabilities.

docs/usecase.md ADDED Viewed

	@@ -0,0 +1,60 @@

+# CommitGuard — Use Cases & Test Scenarios
+This document outlines the primary use cases and associated test scenarios for running CommitGuard as a standalone Command Line Interface (CLI) tool and as an integrated Plugin (e.g., CI/CD Pipeline or IDE Extension).
+## 1. CommitGuard as a CLI (Standalone Workflow)
+This use case is for security researchers, data scientists, and ML engineers training or evaluating the model locally or on a dedicated VM.
+### 1.1 Data Preprocessing
+- **Scenario:** Convert raw Devign JSON into a filtered, balanced, 5000-sample JSONL file.
+- **Action:** Run `python scripts/preprocess_devign.py --limit 5000`
+- **Expected Result:** `data/devign_filtered.jsonl` is created with clean, XML-ready code diffs and valid `cwe` labels.
+### 1.2 Environment Server (OpenEnv)
+- **Scenario:** Start the RLVR training environment.
+- **Action:** Run `python -m commitguard_env.server`
+- **Expected Result:** Server starts on port 8000. `curl http://localhost:8000/health` returns `{"status": "healthy"}`. `tests/test_no_leak.py` confirms no label leakage in `/reset` or `/state`.
+### 1.3 Model Training (GRPO)
+- **Scenario:** Train the Llama-3.2-3B model using the live RLVR environment.
+- **Action:** Run `python scripts/train_grpo.py --live --steps 500`
+- **Expected Result:** Model trains using 4-bit quantization and LoRA. Training curve uploads to WandB. Checkpoints save every 50 steps.
+### 1.4 Agentic Evaluation
+- **Scenario:** Evaluate the trained LoRA adapter on 100 held-out test samples.
+- **Action:** Run `python scripts/evaluate.py --adapter_path ./outputs/commitguard-final`
+- **Expected Result:** The agent executes a 5-step loop (request_context -> analyze -> verdict). A detailed `eval_results.json` report is generated showing accuracy per CWE.
+### 1.5 Visualization
+- **Scenario:** Generate performance plots for reporting.
+- **Action:** Run `python plots/plot_baseline_vs_trained.py`
+- **Expected Result:** A PNG bar chart is saved showing the clear accuracy delta between baseline and trained model.
+---
+## 2. CommitGuard as a Plugin (Developer Workflow)
+This use case is for software engineers interacting with the trained model during their daily development cycle to prevent vulnerabilities from reaching production.
+### 2.1 Git Pre-Commit Hook (Local Plugin)
+- **Scenario:** A developer attempts to commit code containing an SQL injection (e.g., `CWE-89`).
+- **Action:** Developer runs `git commit -m "Update user query"`. The hook captures the local diff and invokes the CommitGuard agent API.
+- **Expected Result:**
+  - The agent detects the vulnerability before the commit executes.
+  - The commit is **blocked** (exit code 1).
+  - The terminal outputs the agent's XML `exploit_sketch`: `"SQL injection in user_id via f-string construction."`
+### 2.2 CI/CD Pull Request Reviewer (GitHub Action)
+- **Scenario:** A developer opens a Pull Request with a new feature.
+- **Action:** GitHub Actions triggers a CommitGuard workflow container. The agent runs a full evaluation loop over the PR's diff patch.
+- **Expected Result:**
+  - The agent posts an automated review comment directly on the PR.
+  - If vulnerable, it flags the specific line and provides a remediation suggestion.
+  - The PR status check turns **Red (Failed)** if a severe vulnerability is detected, preventing a merge to the main branch.
+### 2.3 IDE Extension (VS Code / Cursor Integration)
+- **Scenario:** Real-time vulnerability detection while typing.
+- **Action:** Developer saves a file (`Ctrl+S`). The IDE plugin sends the local file diff to a hosted CommitGuard backend.
+- **Expected Result:**
+  - The agent identifies an issue using its `analyze` action step.
+  - A diagnostic warning (red squiggly line) appears under the vulnerable code snippet in the editor.
+  - Hovering shows the agent's `<reasoning>` and suggested safe implementation.

docs/vulnerabilities.md ADDED Viewed

	@@ -0,0 +1,90 @@

+# 🛡️ CommitGuard — Vulnerability Catalog & Test Cases
+This document details the specific security loopholes and code-level vulnerabilities that CommitGuard is trained to detect. Each category includes the "loophole" (the technical flaw), the "exploit" (how it’s abused), and the "test case" (the diff the model must analyze).
+---
+## 1. SQL Injection (CWE-89)
+**The Loophole:** Using untrusted user input directly in a database query string without parameterization or escaping.
+- **The Attack:** An attacker provides input like `' OR 1=1 --` to bypass authentication or dump the entire database.
+- **CommitGuard Test Case:**
+  ```diff
+  - cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
+  + cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
+  ```
+- **Agentic reasoning:** The model should recognize that replacing a parameterized query (`%s`) with an f-string is a high-severity regression.
+## 2. Buffer Overflow (CWE-120 / CWE-787)
+**The Loophole:** Copying data into a fixed-length buffer without checking the size of the source data.
+- **The Attack:** An attacker sends more data than the buffer can hold, overwriting adjacent memory to execute arbitrary code (Return-Oriented Programming).
+- **CommitGuard Test Case:**
+  ```diff
+  - strncpy(dest, src, sizeof(dest) - 1);
+  + strcpy(dest, src);
+  ```
+- **Agentic reasoning:** The model must identify that `strcpy` is inherently unsafe compared to the bound-checked `strncpy`.
+## 3. Path Traversal (CWE-22)
+**The Loophole:** Constructing a file path using user input without neutralizing `../` sequences.
+- **The Attack:** An attacker provides input like `../../../../etc/passwd` to read sensitive system files.
+- **CommitGuard Test Case:**
+  ```diff
+  - filename = os.path.basename(user_input)
+  - path = os.path.join("/safe/dir", filename)
+  + path = os.path.join("/safe/dir", user_input)
+  ```
+- **Agentic reasoning:** The model should flag the removal of `os.path.basename()` as it allows the user to break out of the intended directory.
+## 4. Integer Overflow to Buffer Overflow (CWE-190)
+**The Loophole:** A calculation used for memory allocation overflows, resulting in a much smaller buffer than required.
+- **The Attack:** An attacker provides a large integer that causes an addition or multiplication to wrap around to a small value, leading to a heap overflow.
+- **CommitGuard Test Case:**
+  ```diff
+  - size_t total_size = num_items * item_size;
+  - if (num_items > MAX_ITEMS) return ERROR;
+  + size_t total_size = num_items * item_size;
+  + // Removed bounds check to support larger datasets
+  ```
+- **Agentic reasoning:** The model identifies that removing the `MAX_ITEMS` check makes the `total_size` calculation susceptible to wrapping.
+## 5. Use-After-Free (CWE-416)
+**The Loophole:** Referencing memory after it has been freed.
+- **The Attack:** An attacker triggers a free and then influences the program to use that pointer, potentially leading to arbitrary code execution if the memory has been re-allocated.
+- **CommitGuard Test Case:**
+  ```diff
+    free(buffer);
+  + printf("Log: %s", buffer); // Debugging line added
+  ```
+- **Agentic reasoning:** The model flags the `printf` call because it accesses `buffer` immediately after `free()`.
+## 6. Command Injection (CWE-78)
+**The Loophole:** Passing unsanitized input to a system shell command.
+- **The Attack:** An attacker provides input like `; rm -rf /` to execute arbitrary system commands.
+- **CommitGuard Test Case:**
+  ```diff
+  - subprocess.run(["ls", folder_name])
+  + os.system("ls " + folder_name)
+  ```
+- **Agentic reasoning:** The model recognizes that `os.system` invokes a shell and is vulnerable to concatenation-based injection, unlike the list-based `subprocess.run`.
+## 7. Hardcoded Credentials (CWE-798)
+**The Loophole:** Storing secrets (API keys, passwords) in the source code.
+- **The Attack:** An attacker reads the leaked key from the git history and gains unauthorized access to external services.
+- **CommitGuard Test Case:**
+  ```diff
+  - api_key = os.environ.get("STRIPE_KEY")
+  + api_key = "sk_test_4eC39HqLyjWDarjtT1zdp7dc"
+  ```
+- **Agentic reasoning:** The model flags the change from an environment variable to a plaintext string as a security risk.
+---
+## 📈 Summary of Coverage
+CommitGuard's RL environment is specifically designed to stress-test an agent's ability to see these patterns in **diff format**. Unlike static analysis tools (SAST) which look at the whole file, CommitGuard forces the agent to understand **what changed** and whether that change introduced one of the loopholes listed above.

gitlab-ci-template.yml ADDED Viewed

	@@ -0,0 +1,16 @@

+.commitguard-scan:
+  image: python:3.12-slim
+  stage: test
+  variables:
+    COMMITGUARD_MODEL: "inmodel-labs/commitguard-llama-3b"
+    FAIL_ON_VULNERABLE: "true"
+  before_script:
+    - apt-get update && apt-get install -y git
+    - pip install commitguard[scan]  # Assuming published to PyPI, or pip install git+...
+  script:
+    - |
+      FAIL_ARG=""
+      if [ "$FAIL_ON_VULNERABLE" = "true" ]; then
+        FAIL_ARG="--fail-on-vulnerable"
+      fi
+      commitguard scan --commit HEAD --format text $FAIL_ARG --model $COMMITGUARD_MODEL

notebooks/train_commitguard.ipynb ADDED Viewed

	@@ -0,0 +1,604 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# CommitGuard  GRPO Training Notebook\n",
+    "\n",
+    "Train Llama-3.2-3B-Instruct to detect exploitable vulnerabilities in code commits using GRPO (Group Relative Policy Optimization).\n",
+    "\n",
+    "**Requirements:** NVIDIA GPU with 16 GB VRAM (L4/A100/T4). Run this notebook on a GCP VM with GPU attached.\n",
+    "\n",
+    "## Setup\n",
+    "Connect to this notebook via SSH tunnel:\n",
+    "```bash\n",
+    "# On GCP VM:\n",
+    "jupyter notebook --no-browser --port=8888\n",
+    "\n",
+    "# On your local machine:\n",
+    "gcloud compute ssh commitguard-train --zone=us-central1-a -- -NL 8888:localhost:8888\n",
+    "# Then open http://localhost:8888 in browser\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 1  Install Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "<3>WSL (3364 - Relay) ERROR: CreateProcessCommon:800: execvpe(/bin/bash) failed: No such file or directory\n"
+     ]
+    },
+    {
+     "ename": "CalledProcessError",
+     "evalue": "Command 'b'# Install uv for fast, reliable dependency resolution\\ncurl -LsSf https://astral.sh/uv/install.sh | sh\\nexport PATH=\"$HOME/.local/bin:$PATH\"\\n\\nuv pip install -q \\\\\\n    \"unsloth[cu124-torch240]\" \\\\\\n    \"trl>=0.12\" \\\\\\n    \"peft>=0.13\" \\\\\\n    \"bitsandbytes>=0.44\" \\\\\\n    \"transformers>=4.46\" \\\\\\n    \"datasets>=3.0\" \\\\\\n    \"accelerate>=1.0\" \\\\\\n    \"wandb\" \\\\\\n    \"fastapi\" \\\\\\n    \"uvicorn[standard]\" \\\\\\n    \"requests\" \\\\\\n    \"matplotlib\"\\n'' returned non-zero exit status 1.",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mCalledProcessError\u001b[39m                        Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[3]\u001b[39m\u001b[32m, line 1\u001b[39m\n\u001b[32m----> \u001b[39m\u001b[32m1\u001b[39m get_ipython().run_cell_magic(\u001b[33m'bash'\u001b[39m, \u001b[33m''\u001b[39m, \u001b[33m'# Install uv for fast, reliable dependency resolution\\ncurl -LsSf https://astral.sh/uv/install.sh | sh\\nexport PATH=\"$HOME/.local/bin:$PATH\"\\n\\nuv pip install -q \\\\\\n    \"unsloth[cu124-torch240]\" \\\\\\n    \"trl>=0.12\" \\\\\\n    \"peft>=0.13\" \\\\\\n    \"bitsandbytes>=0.44\" \\\\\\n    \"transformers>=4.46\" \\\\\\n    \"datasets>=3.0\" \\\\\\n    \"accelerate>=1.0\" \\\\\\n    \"wandb\" \\\\\\n    \"fastapi\" \\\\\\n    \"uvicorn[standard]\" \\\\\\n    \"requests\" \\\\\\n    \"matplotlib\"\\n'\u001b[39m)\n",
+      "\u001b[31mCalledProcessError\u001b[39m: Command 'b'# Install uv for fast, reliable dependency resolution\\ncurl -LsSf https://astral.sh/uv/install.sh | sh\\nexport PATH=\"$HOME/.local/bin:$PATH\"\\n\\nuv pip install -q \\\\\\n    \"unsloth[cu124-torch240]\" \\\\\\n    \"trl>=0.12\" \\\\\\n    \"peft>=0.13\" \\\\\\n    \"bitsandbytes>=0.44\" \\\\\\n    \"transformers>=4.46\" \\\\\\n    \"datasets>=3.0\" \\\\\\n    \"accelerate>=1.0\" \\\\\\n    \"wandb\" \\\\\\n    \"fastapi\" \\\\\\n    \"uvicorn[standard]\" \\\\\\n    \"requests\" \\\\\\n    \"matplotlib\"\\n'' returned non-zero exit status 1."
+     ]
+    }
+   ],
+   "source": [
+    "!pip install -q unsloth\n",
+    "!pip uninstall unsloth -y && pip install -q --upgrade --no-cache-dir \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\"\n",
+    "!pip install -q trl>=0.12 peft bitsandbytes transformers datasets accelerate wandb fastapi uvicorn[standard] requests matplotlib"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 2  Verify GPU"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "print(f\"PyTorch:  {torch.__version__}\")\n",
+    "print(f\"CUDA:     {torch.cuda.is_available()}\")\n",
+    "if torch.cuda.is_available():\n",
+    "    print(f\"GPU:      {torch.cuda.get_device_name(0)}\")\n",
+    "    print(f\"VRAM:     {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB\")\n",
+    "    print(f\"BF16:     {torch.cuda.is_bf16_supported()}\")\n",
+    "else:\n",
+    "    raise RuntimeError(\"No GPU detected  this notebook requires a CUDA GPU.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 3  Clone Repo & Start Env Server"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os, subprocess, time, requests, sys\n",
+    "\n",
+    "# Check if running in Google Colab\n",
+    "if \"google.colab\" in sys.modules:\n",
+    "    print(\"Running in Google Colab.\")\n",
+    "    # Reset to base directory in case cell is run multiple times\n",
+    "    os.chdir(\"/content\")\n",
+    "    \n",
+    "    if not os.path.exists(\"/content/project.zip\"):\n",
+    "        from google.colab import files\n",
+    "        print(\"\\n--- WE NEED YOUR PROJECT.ZIP ---\")\n",
+    "        print(\"Please click 'Choose Files' below and select project.zip from your computer:\\n\")\n",
+    "        uploaded = files.upload()\n",
+    "    \n",
+    "    if os.path.exists(\"/content/project.zip\"):\n",
+    "        print(\"Extracting project.zip...\")\n",
+    "        !unzip -q -o /content/project.zip -d /content/commitguard\n",
+    "    else:\n",
+    "        print(\"\\n*** ERROR: project.zip still not found! ***\\n\")\n",
+    "        sys.exit(1)\n",
+    "        \n",
+    "    os.chdir(\"/content/commitguard\")\n",
+    "    REPO_DIR = os.getcwd()\n",
+    "else:\n",
+    "    if os.path.basename(os.getcwd()) == \"notebooks\":\n",
+    "        REPO_DIR = os.path.abspath(\"..\")\n",
+    "    else:\n",
+    "        REPO_DIR = os.getcwd()\n",
+    "    os.chdir(REPO_DIR)\n",
+    "\n",
+    "print(f\"Using REPO_DIR: {REPO_DIR}\")\n",
+    "\n",
+    "# 2. Install current project in editable mode\n",
+    "!pip install -e . -q\n",
+    "\n",
+    "# 3. Start env server in background\n",
+    "server_proc = subprocess.Popen(\n",
+    "    [sys.executable, \"-m\", \"commitguard_env.server\"],\n",
+    "    stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True\n",
+    ")\n",
+    "time.sleep(5)\n",
+    "\n",
+    "try:\n",
+    "    r = requests.get(\"http://localhost:8000/health\")\n",
+    "    print(f\"Env server: {r.json()}\")\n",
+    "except Exception as e:\n",
+    "    print(f\"Server failed to start: {e}\")\n",
+    "    stdout, stderr = server_proc.communicate(timeout=1)\n",
+    "    print(f\"STDOUT: {stdout}\")\n",
+    "    print(f\"STDERR: {stderr}\")\n",
+    "\n",
+    "# Quick sanity  reset + step\n",
+    "r = requests.post(\"http://localhost:8000/reset\", json={})\n",
+    "obs = r.json()[\"observation\"]\n",
+    "print(f\"Sample diff length: {len(obs['diff'])} chars, files: {obs['available_files']}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 4  HuggingFace Login (for gated Llama model)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import login\n",
+    "\n",
+    "HF_TOKEN = os.getenv(\"HF_TOKEN\")\n",
+    "if HF_TOKEN:\n",
+    "    login(token=HF_TOKEN)\n",
+    "    print(\"Logged in via token.\")\n",
+    "else:\n",
+    "    login()\n"   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 5  Wandb Login (optional but recommended)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import wandb\n",
+    "\n",
+    "USE_WANDB = False\n",
+    "os.environ[\"WANDB_DISABLED\"] = \"true\"\n",
+    "print(\"Wandb disabled.\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 6  Load Model with Unsloth (4-bit LoRA)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from unsloth import FastLanguageModel, PatchFastRL\n",
+    "from trl import GRPOConfig, GRPOTrainer\n",
+    "\n",
+    "PatchFastRL(\"GRPO\", FastLanguageModel)\n",
+    "\n",
+    "MODEL_NAME = \"meta-llama/Llama-3.2-3B-Instruct\"\n",
+    "\n",
+    "print(f\"Loading {MODEL_NAME} in 4-bit...\")\n",
+    "model, tokenizer = FastLanguageModel.from_pretrained(\n",
+    "    model_name=MODEL_NAME,\n",
+    "    max_seq_length=2048,\n",
+    "    load_in_4bit=True,\n",
+    "    fast_inference=False,\n",
+    "    max_lora_rank=16,\n",
+    ")\n",
+    "\n",
+    "model = FastLanguageModel.get_peft_model(\n",
+    "    model,\n",
+    "    r=8,\n",
+    "    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n",
+    "                    \"gate_proj\", \"up_proj\", \"down_proj\"],\n",
+    "    lora_alpha=16,\n",
+    "    lora_dropout=0,\n",
+    "    bias=\"none\",\n",
+    "    use_gradient_checkpointing=\"unsloth\",\n",
+    "    random_state=3407,\n",
+    ")\n",
+    "\n",
+    "print(f\"Model loaded. Trainable params: {model.print_trainable_parameters()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 7  Build Training Dataset from Env"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys, requests\n",
+    "from datasets import Dataset\n",
+    "\n",
+    "sys.path.insert(0, os.path.join(REPO_DIR, \"scripts\"))\n",
+    "from agent_prompt import SYSTEM_PROMPT, get_agent_prompt\n",
+    "\n",
+    "ENV_URL = \"http://localhost:8000\"\n",
+    "N_SAMPLES = 200  # Number of training prompts (updated)\n",
+    "\n",
+    "samples = []\n",
+    "for i in range(N_SAMPLES):\n",
+    "    r = requests.post(f\"{ENV_URL}/reset\", json={}, timeout=10)\n",
+    "    if r.status_code != 200:\n",
+    "        continue\n",
+    "    obs = r.json()[\"observation\"]\n",
+    "    state_r = requests.get(f\"{ENV_URL}/state\").json()\n",
+    "    current_sample_id = state_r.get(\"state\", {}).get(\"current_sample_id\", \"unknown\")\n",
+    "    user_msg = get_agent_prompt(obs[\"diff\"], obs[\"available_files\"], obs.get(\"step_idx\", 0))\n",
+    "    samples.append({\n",
+    "        \"prompt\": [\n",
+    "            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+    "            {\"role\": \"user\", \"content\": user_msg},\n",
+    "        ],\n",
+    "        \"sample_id\": current_sample_id,\n",
+    "    })\n",
+    "    if (i + 1) % 50 == 0:\n",
+    "        print(f\"  fetched {i + 1}/{N_SAMPLES}\")\n",
+    "\n",
+    "dataset = Dataset.from_list(samples)\n",
+    "print(f\"\\nDataset ready: {len(dataset)} samples\")\n",
+    "print(f\"Sample prompt preview: {str(dataset[0]['prompt'][1]['content'])[:200]}...\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 8  Define Reward Function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_reward_from_env(prompts, completions, sample_id, **kwargs) -> list[float]:\n",
+    "    \"\"\"Send each completion to the env as an action, collect reward.\"\"\"\n",
+    "    rewards = []\n",
+    "    for p_id, completion in zip(sample_id, completions):\n",
+    "        try:\n",
+    "            requests.post(f\"{ENV_URL}/reset\", json={\"sample_id\": p_id}, timeout=10)\n",
+    "            text = completion[-1][\"content\"] if isinstance(completion, list) else str(completion)\n",
+    "            r = requests.post(f\"{ENV_URL}/step\", json={\"action\": text}, timeout=10)\n",
+    "            if r.status_code == 200:\n",
+    "                rewards.append(float(r.json().get(\"reward\", 0.0)))\n",
+    "            else:\n",
+    "                rewards.append(-0.5)\n",
+    "        except Exception:\n",
+    "            rewards.append(-1.0)\n",
+    "    return rewards\n",
+    "\n",
+    "# Quick test\n",
+    "test_r = get_reward_from_env(\n",
+    "    [\"test\"],\n",
+    "    [\"<action><action_type>verdict</action_type><is_vulnerable>true</is_vulnerable><vuln_type>CWE-119</vuln_type><exploit_sketch>buffer overflow</exploit_sketch></action>\"],\n",
+    "    [\"test_id\"]\n",
+    ")\n",
+    "print(f\"Reward function test: {test_r}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 9  Configure & Launch GRPO Training\n",
+    "\n",
+    "This is the main training loop. ~2-3 hours on L4 for 300 steps."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "OUTPUT_DIR = \"outputs/commitguard-llama-3b\"\n",
+    "\n",
+    "training_args = GRPOConfig(\n",
+    "    output_dir=OUTPUT_DIR,\n",
+    "    num_generations=4,\n",
+    "    max_completion_length=512,\n",
+    "    per_device_train_batch_size=1,\n",
+    "    gradient_accumulation_steps=4,\n",
+    "    learning_rate=5e-6,\n",
+    "    logging_steps=1,\n",
+    "    save_steps=50,\n",
+    "    max_steps=300,\n",
+    "    report_to=\"wandb\" if USE_WANDB else \"none\",\n",
+    "    bf16=torch.cuda.is_bf16_supported(),\n",
+    "    fp16=not torch.cuda.is_bf16_supported(),\n",
+    ")\n",
+    "\n",
+    "trainer = GRPOTrainer(\n",
+    "    model=model,\n",
+    "    processing_class=tokenizer,\n",
+    "    reward_funcs=[get_reward_from_env],\n",
+    "    args=training_args,\n",
+    "    train_dataset=dataset,\n",
+    ")\n",
+    "\n",
+    "print(\"Starting GRPO training...\")\n",
+    "print(f\"  Steps: {training_args.max_steps}\")\n",
+    "print(f\"  Generations per prompt: {training_args.num_generations}\")\n",
+    "print(f\"  Save every: {training_args.save_steps} steps\")\n",
+    "print(f\"  Output: {OUTPUT_DIR}\")\n",
+    "print(\"=\"*50)\n",
+    "\n",
+    "trainer.train()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 10  Save Final LoRA Adapter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "FINAL_DIR = f\"{OUTPUT_DIR}/final\"\n",
+    "model.save_pretrained_merged(FINAL_DIR, tokenizer, save_method=\"lora\")\n",
+    "print(f\"LoRA adapter saved to {FINAL_DIR}\")\n",
+    "\n",
+    "# List saved files\n",
+    "for f in sorted(os.listdir(FINAL_DIR)):\n",
+    "    size_mb = os.path.getsize(os.path.join(FINAL_DIR, f)) / 1024**2\n",
+    "    print(f\"  {f}: {size_mb:.1f} MB\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 11  Quick Evaluation (Baseline vs Trained)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
+    "# Load test set\n",
+    "test_path = os.path.join(REPO_DIR, \"data\", \"devign_test.jsonl\")\n",
+    "with open(test_path) as f:\n",
+    "    test_samples = [json.loads(l) for l in f if l.strip()]\n",
+    "\n",
+    "print(f\"Evaluating on {len(test_samples)} held-out samples...\")\n",
+    "\n",
+    "# Run trained model on test set\n",
+    "FastLanguageModel.for_inference(model)\n",
+    "\n",
+    "correct = 0\n",
+    "results = []\n",
+    "\n",
+    "for i, sample in enumerate(test_samples):\n",
+    "    user_msg = get_agent_prompt(sample[\"diff\"], sample[\"available_files\"], 0)\n",
+    "    messages = [\n",
+    "        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+    "        {\"role\": \"user\", \"content\": user_msg},\n",
+    "    ]\n",
+    "    inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\", add_generation_prompt=True).to(model.device)\n",
+    "    with torch.no_grad():\n",
+    "        output = model.generate(inputs, max_new_tokens=512, temperature=0.1, do_sample=True)\n",
+    "    response = tokenizer.decode(output[0][inputs.shape[1]:], skip_special_tokens=True)\n",
+    "\n",
+    "    # Parse verdict\n",
+    "    sys.path.insert(0, os.path.join(REPO_DIR, \"commitguard_env\"))\n",
+    "    from commitguard_env.parse_action import parse_action\n",
+    "    action = parse_action(response)\n",
+    "\n",
+    "    pred_vuln = bool(action.is_vulnerable) if action.is_vulnerable is not None else False\n",
+    "    truth_vuln = sample[\"is_vulnerable\"]\n",
+    "\n",
+    "    if pred_vuln == truth_vuln:\n",
+    "        correct += 1\n",
+    "\n",
+    "    results.append({\n",
+    "        \"sample_id\": sample[\"sample_id\"],\n",
+    "        \"pred\": pred_vuln,\n",
+    "        \"truth\": truth_vuln,\n",
+    "        \"cwe\": sample.get(\"cwe\"),\n",
+    "        \"vuln_type\": action.vuln_type,\n",
+    "    })\n",
+    "\n",
+    "    if (i + 1) % 20 == 0:\n",
+    "        print(f\"  {i+1}/{len(test_samples)}  running accuracy: {100*correct/(i+1):.1f}%\")\n",
+    "\n",
+    "accuracy = 100 * correct / len(test_samples)\n",
+    "print(f\"\\nFinal trained accuracy: {accuracy:.1f}%\")\n",
+    "\n",
+    "with open(os.path.join(REPO_DIR, \"eval_trained.json\"), \"w\") as f:\n",
+    "    json.dump(results, f, indent=2)\n",
+    "print(\"Results saved to eval_trained.json\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 12  Generate Plots"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "from collections import Counter\n",
+    "\n",
+    "os.makedirs(os.path.join(REPO_DIR, \"plots\"), exist_ok=True)\n",
+    "\n",
+    "# --- Plot 1: Training reward curve (from trainer logs) ---\n",
+    "if hasattr(trainer, 'state') and trainer.state.log_history:\n",
+    "    steps = [l[\"step\"] for l in trainer.state.log_history if \"loss\" in l]\n",
+    "    losses = [l[\"loss\"] for l in trainer.state.log_history if \"loss\" in l]\n",
+    "    \n",
+    "    fig, ax = plt.subplots(figsize=(10, 5))\n",
+    "    ax.plot(steps, losses, color=\"#2ecc71\", linewidth=2)\n",
+    "    ax.set_xlabel(\"Training Step\")\n",
+    "    ax.set_ylabel(\"Loss\")\n",
+    "    ax.set_title(\"CommitGuard  GRPO Training Loss\")\n",
+    "    ax.grid(True, linestyle=\"--\", alpha=0.5)\n",
+    "    fig.savefig(os.path.join(REPO_DIR, \"plots\", \"reward_curve.png\"), dpi=150)\n",
+    "    plt.show()\n",
+    "    print(\"Saved plots/reward_curve.png\")\n",
+    "\n",
+    "    # --- Plot 2: Accuracy comparison ---\n",
+    "    with open(os.path.join(REPO_DIR, \"eval_baseline.json\")) as f:\n",
+    "        b_data = json.load(f)\n",
+    "    baseline_acc = 100 * sum(1 for x in b_data if x['pred'] == x['truth']) / len(b_data)\n",
+    "    trained_acc = accuracy\n",
+    "\n",
+    "    fig, ax = plt.subplots(figsize=(8, 5))\n",
+    "    bars = ax.bar([\"Baseline (Untrained)\", \"CommitGuard (Trained)\"],\n",
+    "                  [baseline_acc, trained_acc],\n",
+    "                  color=[\"#95a5a6\", \"#3498db\"])\n",
+    "    ax.set_ylabel(\"Detection Accuracy (%)\")\n",
+    "    ax.set_title(\"Vulnerability Detection: Baseline vs. Trained\")\n",
+    "    ax.set_ylim(0, 100)\n",
+    "    for bar in bars:\n",
+    "        h = bar.get_height()\n",
+    "        ax.text(bar.get_x() + bar.get_width()/2., h + 1, f\"{h:.1f}%\",\n",
+    "                ha=\"center\", fontweight=\"bold\")\n",
+    "    fig.savefig(os.path.join(REPO_DIR, \"plots\", \"baseline_vs_trained.png\"), dpi=150)\n",
+    "    plt.show()\n",
+    "    print(\"Saved plots/baseline_vs_trained.png\")\n",
+    "\n",
+    "    # --- Plot 3: Per-CWE breakdown ---\n",
+    "    cwe_correct = Counter()\n",
+    "    cwe_total = Counter()\n",
+    "    for r in results:\n",
+    "        if r[\"cwe\"]:\n",
+    "            cwe_total[r[\"cwe\"]] += 1\n",
+    "            if r[\"pred\"] == r[\"truth\"]:\n",
+    "                cwe_correct[r[\"cwe\"]] += 1\n",
+    "\n",
+    "    cwes = sorted(cwe_total.keys())\n",
+    "    accs = [100 * cwe_correct[c] / cwe_total[c] if cwe_total[c] > 0 else 0 for c in cwes]\n",
+    "\n",
+    "    if cwes:\n",
+    "        fig, ax = plt.subplots(figsize=(10, 5))\n",
+    "        ax.bar(cwes, accs, color=\"#e67e22\")\n",
+    "        ax.set_ylabel(\"Accuracy (%)\")\n",
+    "        ax.set_title(\"Trained Model Accuracy by CWE Type\")\n",
+    "        ax.set_ylim(0, 100)\n",
+    "        plt.xticks(rotation=45)\n",
+    "        plt.tight_layout()\n",
+    "        fig.savefig(os.path.join(REPO_DIR, \"plots\", \"per_cwe.png\"), dpi=150)\n",
+    "        plt.show()\n",
+    "        print(\"Saved plots/per_cwe.png\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Cell 13  Cleanup\n",
+    "\n",
+    "Stop the env server and print final summary."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "server_proc.terminate()\n",
+    "print(\"Env server stopped.\")\n",
+    "\n",
+    "print(\"\\n\" + \"=\"*50)\n",
+    "print(\"  TRAINING COMPLETE\")\n",
+    "print(\"=\"*50)\n",
+    "print(f\"  Model:    {MODEL_NAME}\")\n",
+    "print(f\"  Steps:    {training_args.max_steps}\")\n",
+    "print(f\"  Accuracy: {baseline_acc:.1f}%  {trained_acc:.1f}% (+{trained_acc - baseline_acc:.1f}pp)\")\n",
+    "print(f\"  Adapter:  {FINAL_DIR}\")\n",
+    "print(f\"  Plots:    plots/reward_curve.png, baseline_vs_trained.png, per_cwe.png\")\n",
+    "\n",
+    "print(\"\\nNext: copy outputs/ and plots/ back to your local machine.\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

pyproject.toml ADDED Viewed

	@@ -0,0 +1,48 @@

+[project]
+name = "commitguard"
+version = "0.1.0"
+description = "CommitGuard OpenEnv RL environment for commit-time vuln detection"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+  "fastapi>=0.110",
+  "uvicorn[standard]>=0.27",
+  "pydantic>=2.6",
+]
+[project.optional-dependencies]
+dev = [
+  "pytest>=8.0",
+  "requests>=2.31",
+]
+scan = [
+  "torch>=2.4",
+  "transformers>=4.46",
+  "accelerate>=1.0",
+]
+train = [
+  "requests",
+  "torch>=2.4",
+  "transformers>=4.46",
+  "trl>=0.12",
+  "accelerate>=1.0",
+  "peft>=0.13",
+  "datasets>=3.0",
+  "wandb",
+  "matplotlib",
+  "unsloth",
+  "bitsandbytes>=0.44",
+  "jupyter",
+  "ipywidgets",
+]
+[project.scripts]
+commitguard = "commitguard_env.cli:main"
+server = "commitguard_env.server:main"
+[tool.setuptools]
+packages = ["commitguard_env"]
+[build-system]
+requires = ["setuptools>=68"]
+build-backend = "setuptools.build_meta"

pyrightconfig.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "venvPath": ".",
+  "venv": ".venv",
+  "include": [
+    "scripts",
+    "commitguard_env",
+    "server",
+    "."
+  ],
+  "extraPaths": [
+    "${workspaceFolder}",
+    "${workspaceFolder}/scripts"
+  ],
+  "reportMissingImports": true,
+  "typeCheckingMode": "basic"
+}