Spaces:

Nitishkumar-ai
/

commitguard

Configuration error

App Files Files Community

commitguard / prd.md

Nitishkumar-ai

Upload folder using huggingface_hub

e4f3d12 verified about 19 hours ago

preview code

raw

history blame contribute delete

21.1 kB

CommitGuard Product Requirements Document

Project: CommitGuard Owner: Niti (Inmodel Labs) Team: Niti, Deepak, Divyank Submission deadline: Sunday 5:00 PM IST Hackathon: Meta OpenEnv Hackathon (PyTorch + Hugging Face + Scaler) Document status: Locked. Scope freeze at midnight Saturday.

1. Executive Summary

CommitGuard is a Reinforcement Learning environment built on Meta OpenEnv that trains LLM agents to detect exploitable vulnerabilities in code commits. The submission demonstrates that AI-paced security review is feasible that an agent trained on commit-level reasoning can match the velocity at which AI coding agents are now shipping production code.

The deliverable is a runnable HF Space hosting the env, a training notebook that produces a measurable learning curve on Llama-3.2-3B-Instruct, a demo video showing the qualitative shift from untrained to trained behavior, and a README that tells the story.

2. Problem Statement

2.1 The shift in software development

Until recently, code was written by humans at human velocity. Security review processes were designed around this assumption periodic pentests every 3 to 6 months, with manual code review at PR time. The cycle worked because the codebase changed slowly enough that periodic deep review caught most issues before they reached production.

This assumption has broken. Code is now being written and shipped by AI coding agents Claude Code, Cursor, autonomous coding agents at 10 to 100 times human velocity. Companies push to production daily, sometimes hourly. A pentest report from six months ago describes a codebase that no longer exists.

2.2 The asymmetry

The same class of LLM that writes the code can be weaponized to attack it. An adversary equipped with autonomous coding tooling, given repository access or even just leaked commits, can pentest at the same velocity defenders ship. Defense runs on human time. Offense runs on AI time. This asymmetry is unsustainable for any organization shipping AI-generated code at scale.

2.3 Why this is a frontier problem

AI red-teaming today is overwhelmingly a manual, human-bottlenecked discipline. Researchers at Anthropic, OpenAI, and Meta craft attacks one at a time. There is no automated equivalent of Metasploit for AI-generated code. Closing that gap is an open research problem that frontier labs are actively investing in.

3. Goals and Non-Goals

3.1 Goals (in scope for this submission)

Deliver a working OpenEnv environment that takes a code commit as input and rewards an agent for correctly identifying vulnerabilities, the CWE class, and a plausible exploit
Train a small Llama variant (Llama-3.2-3B-Instruct) on the env using GRPO via TRL + Unsloth
Demonstrate measurable learning baseline vs. trained accuracy with reward curves
Ship a complete submission package: HF Space, training notebook, README, demo video, optional HF blog post
Frame the work in language a Meta researcher recognizes: RLVR (Reinforcement Learning from Verifiable Rewards), commit-time security, AI-paced defense

3.2 Non-goals (explicitly out of scope)

Production-ready security tool this is a research environment, not a CI plugin
Real-time exploit execution against arbitrary code the v1 reward uses pattern matching, not sandboxed execution
Multi-file / repo-level reasoning v1 operates on single-file commits up to 80 lines
Multi-agent self-play listed in Future Work
Pentesting beyond static code analysis no network attacks, social engineering, or runtime probing
Coverage of all CWEs v1 focuses on the top 10 CWEs in Devign

3.3 Non-goals from the rubric perspective

The rubric rewards ambition and storytelling more heavily than engineering polish. Therefore: not pursuing exhaustive test coverage, not optimizing for inference latency, not building a fancy frontend. The HF Space's default web UI is sufficient.

4. Target Users and Stakeholders

Stakeholder	Role	What they care about
Hackathon judges (Meta partner engineers)	Primary audience	Innovation, story, training evidence, reward design
Meta Superintelligence Labs researchers	Aspirational audience	Frontier framing, RLVR alignment, paper-worthiness
HF community	Discovery audience	Reproducibility, runnable Space, clean README
Future contributors	Builder audience	Code clarity, extensibility hooks for v2

5. Solution Overview

5.1 The environment

CommitGuard is an OpenEnv environment where an agent investigates code commits and decides whether they introduce exploitable vulnerabilities. The agent has limited investigation budget (5 steps maximum per episode), forcing it to reason efficiently rather than brute-forcing context.

5.2 The agent loop

reset() env loads a commit (a code_before/code_after pair plus metadata) from a preprocessed Devign-derived dataset, returns the diff and the list of available files in the repo
step(action) agent emits one of three action types:
- request_context(file_path) pull surrounding code (small reward penalty, encourages efficiency)
- analyze(reasoning) write chain-of-thought, no reward effect, logged for traces
- verdict(is_vulnerable, vuln_type, exploit_sketch) terminate the episode with a judgment
Reward fires on verdict, computed server-side against ground truth the agent never sees

5.3 Reward design (RLVR philosophy)

The reward is tiered and grounded in dataset truth, not in another LLM's opinion. This is deliberate it follows the RLVR tradition (verifiable rewards from ground truth or executable checks) and prevents the reward hacking that plagues LLM-as-judge setups.

Signal	Reward
Correct binary verdict (vulnerable vs. safe)	+1.0
Correct CWE classification (when vulnerable)	+0.5
Plausible exploit sketch (CWE-keyword match)	+0.5
False positive (safe flagged as vulnerable)	-1.0
False negative (real vuln missed)	-0.5
Per-step context request	-0.05
Episode step cap	5 steps

The shape is hard to game flagging everything is punished by false positives, never investigating means no exploit sketch bonus.

6. Technical Architecture

6.1 System diagram

     HTTP/JSON      
   TRL + Unsloth           HF Space         
   Llama-3.2-3B         reset/step         FastAPI server   
   GRPO trainer         /state             (Docker)         
   (HF Jobs A10G)                                            
                          
                                                Devign        
                                                JSONL         
                                                  
                                                  
                                                Reward        
                                                function

6.2 Component breakdown

Env server (Python, FastAPI, Docker, OpenEnv 0.2.3+)

models.py Action, Observation, State dataclasses (extends OpenEnv base classes)
environment.py reset(), step(), state() methods on the CommitGuardEnvironment class
reward.py pure function compute_reward(action, ground_truth, cwe_keywords) -> float
parse_action.py XML-tag parser, robust to malformed model output
data/devign_filtered.jsonl preprocessed dataset, shipped in image
data/cwe_keywords.json top-10 CWE exploit-pattern keyword map

Env client (auto-generated by OpenEnv CLI)

client.py HTTPEnvClient subclass, used by training notebook
Installable via pip install git+https://huggingface.co/spaces/<user>/commitguard

Training pipeline (Python, TRL, Unsloth, PEFT, Wandb)

train_grpo.py GRPOTrainer config + main loop
agent_prompt.py system prompt template with XML-tag action format
evaluate.py runs N samples through a model, returns accuracy stats

Storytelling artifacts

README.md pitch + results + links
demo_video.mp4 60-90 second before/after, hosted on YouTube unlisted
commitguard_hf_blog.md optional HF Hub blog post (page 26 bonus)
plots/ reward_curve.png, baseline_vs_trained.png, per_cwe.png

6.3 Data flow

Preprocess Devign once at build time data/devign_filtered.jsonl (~5000 samples, balanced, filtered to <80 LOC)
Build Docker image with JSONL embedded
openenv push deploys to HF Space
Training notebook connects to HF Space URL via the OpenEnv HTTP client
Each training step: GRPO generates 4 completions per prompt each runs a full episode in the env rewards collected policy updated via LoRA
Wandb logs reward curves, training loss, checkpoints saved every 50 steps
Final LoRA adapter saved to HF Hub for evaluation and demo

6.4 Cheating prevention

The agent must never see ground truth. Enforced by architecture:

Ground truth lives only on the server, in the JSONL file the env loads from
The Observation dataclass schema explicitly excludes is_vulnerable, cwe_type, and target_file_with_label
A unit test (test_no_leak.py) asserts no observation contains forbidden fields
The server returns only reward (a scalar) on each step, never the label that produced it

7. Stack and Dependencies

7.1 Locked technical decisions

Decision	Choice	Rationale
Env framework	Meta OpenEnv 0.2.3+	Mandatory per submission rules
Server runtime	FastAPI in Docker	OpenEnv default, lowest friction
Hosting	HF Space	Mandatory per submission rules, three-in-one (server + repo + registry)
Data source	Devign (DetectBERT subset)	Already on disk, real CWE labels, manageable size
Model	Llama-3.2-3B-Instruct	Meta-branded for the Meta hackathon, fits A10G with GRPO
Training framework	TRL with GRPO	Native OpenEnv integration via `reward_funcs` callback
Training optimization	Unsloth 4-bit + LoRA r=8	70% memory reduction, 2x speed (page 75 of opening deck)
Training infra	HF Jobs A10G	$0.40-1.50/hr, runs unattended, integrates with HF ecosystem
Dev infra	GCP VM with T4	Stable, no Colab disconnects, leverages 24,000 GCP credit
Action serialization	XML-tag free-text	Robust to small-model output variance, easier than JSON-mode
Logging	Wandb	TRL native, judges can view runs

7.2 Fallback decisions (pre-approved, no debate when triggered)

If this fails	Fall back to	Trigger
Llama-3.2-3B OOM on A10G	Qwen2.5-1.5B-Instruct	First test step crashes
HF Jobs queue full	GCP A10G on-demand	Job queues for >30 min
3-action env doesn't ship by midnight	2-action env (analyze + verdict)	Niti's checkpoint red
Tiered reward buggy	Binary correct/incorrect reward	Deepak's checkpoint red
Training curve flat	Ship with qualitative comparison only	Curve still flat at 10 AM Sunday
Demo video can't be cleanly recorded	Side-by-side text trace in README	Recording fails twice

8. Functional Requirements

8.1 Environment functional requirements

ID	Requirement	Priority
F-1	Env exposes `/health`, `/reset`, `/step`, `/state`, `/docs` endpoints	P0
F-2	`reset()` returns a random commit observation, never the same one twice in a single episode	P0
F-3	`step()` accepts XML-tagged action strings and parses them robustly	P0
F-4	`step()` returns reward, observation, and done flag	P0
F-5	Episode terminates on `verdict` action OR after 5 steps	P0
F-6	Observation never contains ground-truth labels	P0
F-7	Env handles malformed actions gracefully (returns -0.5 reward, doesn't crash)	P1
F-8	Env supports concurrent episodes (multiple training generations in parallel)	P1
F-9	Web UI on HF Space allows manual interaction for demo recording	P2

8.2 Training functional requirements

ID	Requirement	Priority
T-1	Training notebook runs end-to-end on a single A10G	P0
T-2	Reward curve, training loss, and completions logged to Wandb	P0
T-3	LoRA adapter saved every 50 steps for resumability	P0
T-4	Baseline (untrained) evaluation on 100 held-out samples completes in <10 min	P0
T-5	Trained model evaluation produces per-CWE accuracy breakdown	P1
T-6	Notebook runnable from Colab via "Open in Colab" badge in README	P1

8.3 Storytelling functional requirements

ID	Requirement	Priority
S-1	README explains problem, env, results, and motivation in <5 min read	P0
S-2	All plot PNGs committed to repo (not Wandb-only)	P0
S-3	Demo video 60-90 sec, before/after on a single SQL injection example	P0
S-4	Wandb run URL linked in README	P1
S-5	HF Hub blog post published and linked	P2

9. Non-Functional Requirements

Aspect	Requirement
Performance	Single `step()` call returns in <2 seconds on HF Space free tier
Reliability	Env survives 100 random episodes without crash
Reproducibility	Training notebook produces a measurable learning curve when re-run with same seed
Discoverability	HF Space tagged with `openenv`, `rl`, `security`, `code`
Documentation	README is self-contained judge can understand without reading source
Licensing	Code MIT-licensed, dataset attribution to Devign authors

10. Success Metrics

10.1 Submission completeness (binary, must-pass)

HF Space deployed and /health returns 200 OK
Training notebook runs without crashes on a fresh Colab/VM
README has all required links (HF Space, notebook, video, GitHub)
At least one reward curve plot committed
Demo video accessible via public URL

10.2 Quality metrics (graded by rubric)

Metric	Target	Stretch
Innovation framing recognized by mentor	"this is an interesting angle" feedback	"this is paper-worthy" feedback
Baseline accuracy (untrained Llama-3.2-3B)	Establishes a floor (likely 30-45%)
Trained accuracy (after 300 GRPO steps)	Beats baseline by 10pp absolute	Beats baseline by 20pp
Reward curve	Bends upward visibly	Smooth monotonic increase
Per-CWE breakdown	At least 3 CWEs show improvement	All top-5 CWEs show improvement
Storytelling	Mentor at Round 3 can repeat the pitch back	Mentor offers to share with Meta team

10.3 Anti-metrics (things we explicitly don't optimize for)

Number of features
Number of CWEs covered (more is not better depth beats breadth here)
Lines of code
Model size (going larger doesn't make a stronger submission, just slower training)

11. Risks and Mitigations

Risk	Likelihood	Impact	Mitigation
Training run produces flat curve	Medium	High	Pre-approved pivot to qualitative-comparison narrative; baseline already establishes a contrast
HF Space deployment fails at 4 AM	Low	High	Fallback to Docker image with `docker run` instructions in README
Llama-3.2 license approval delayed	Low	Medium	Submit license request immediately at GCP setup; Qwen-1.5B fallback ready
Devign data has bad CWE labels	Medium	Medium	Filter aggressively; if too noisy, drop to top-5 cleanest CWEs only
One teammate falls behind their phase	Medium	High	Sync points at midnight, 9 AM, 3 PM allow scope cuts; mock-env pattern means training isn't blocked
Niti exhausted at Mentor Round 3	High if no sleep	High	Mandatory sleep schedule 12:30 AM5:00 AM, non-negotiable
Demo video can't be cleanly recorded	Medium	Medium	Cherry-pick the best example; fall back to text trace if recording fails twice
HF Space rate limits during training	Low	Medium	Run training on local Docker if HF Space hits limits

12. Timeline and Milestones

Time (IST)	Milestone	Owner
Sat 9:30 PM	Phase 1 starts env scaffolding, data prep, training scaffolding in parallel	All
Sat 8:00 PM	Mentor Round 2 pitch validation	Niti
Sat 11:59 PM	Phase 1 checkpoint env runs, data ready, mock training works	All
Sun 12:00 AM	Scope freeze no new features after this point	All
Sun 12:30 AM	Niti sleep starts	Niti
Sun 3:00 AM	HF Space live, Deepak sleep starts	Deepak
Sun 5:30 AM	Real training run launched on HF Jobs, Divyank sleep starts	Divyank
Sun 5:00 AM	Niti wakes, watches training	Niti
Sun 9:00 AM	Team sync training results, plot status	All
Sun 10:00 AM	Mentor Round 3 final sharpening	Niti
Sun 11:30 AM	Demo video recorded and uploaded	Divyank
Sun 1:00 PM	README finalized	Niti
Sun 3:00 PM	Feature freeze 2-hour reminder, no more changes	All
Sun 4:30 PM	Submission packaged	Niti
Sun 5:00 PM	Submission deadline

13. Open Questions and Assumptions

13.1 Assumptions

Devign dataset is on disk locally (or downloadable in <30 min) to be verified by Deepak at Phase 1 start
HF Space free tier is sufficient for env hosting during the hackathon backup plan: $9/mo upgrade if rate limited
Llama-3.2-3B-Instruct license approval lands within 1 hour of request Qwen fallback ready if not
HF Jobs A10G availability at 5 AM Sunday GCP A10G fallback if queued

13.2 Open questions (to resolve during execution)

Exact number of training steps to maximize curve visibility within budget answered empirically by 9 AM Sunday based on observed loss
Whether to ship a Colab-runnable notebook AND an HF Jobs notebook, or just one defer to Divyank's call at Phase 2
Whether to include a comparison against a non-RL baseline (pure SFT or zero-shot) stretch only

14. Future Work (Post-Hackathon)

This section becomes part of the README's "What's Next" pitch explicitly signals to judges that we understand the limitations and have a roadmap.

Sandboxed exploit execution replace pattern-match reward with actual exploit runs against compiled code in a Docker sandbox
Multi-file commit reasoning extend the env to support diffs spanning multiple files, with a context budget
Self-play loop pair CommitGuard with a code-generation agent; defender and attacker train against each other (the AlphaGo pattern for security)
Agentic harness integration wire into real CI pipelines via the OpenEnv MCP layer, enabling commit-time security review at PR open
Real CVE corpus extend beyond Devign to recent CVE-tagged commits from major open-source repos
Multi-language support current env is C-focused via Devign; extend to Python, JavaScript, Go
Reward shape ablations formal study of how reward composition affects which vulnerability types the model learns fastest

15. Appendix

15.1 Key reference URLs (for the team to bookmark)

OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
OpenEnv Scaler intro: https://tinyurl.com/openenv-scaler
TRL OpenEnv docs: https://huggingface.co/docs/trl/en/openenv
TRL Sudoku GRPO example: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb
TRL Wordle GRPO example: https://github.com/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynb
Unsloth 2048 example: https://github.com/meta-pytorch/OpenEnv/blob/main/tutorial/examples/unsloth_2048.ipynb
Llama-3.2-3B model card: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
HF Jobs docs: https://huggingface.co/docs/hub/jobs
Cursor credits: https://tinyurl.com/sclr-openenv-dashboard
HF $30 credits: https://huggingface.co/coupons/claim/hf-openenv-community

15.2 Document version

v1.0 Saturday evening, Bangalore venue. Locked at midnight Saturday.
Changes after lock require explicit team-wide sign-off and a documented rationale.

16. The 30-Second Pitch (For Mentor Rounds, Memorize This)

"AI is now writing production code at AI speed. Security review still runs on a 6-month human cycle. The same LLMs that write the code can attack it defense is on human time, offense is on AI time, and that asymmetry breaks the security model.

CommitGuard is an OpenEnv where an agent learns to flag exploitable diffs at commit time. We trained Llama-3.2-3B on it via GRPO and the detection rate climbs measurably. It's RLVR verifiable rewards from ground truth, not LLM judges. The thesis: continuous AI red-teaming at the velocity code is being shipped. This is the environment to train it."