kanishcr7 commited on
Commit
5d79ddf
Β·
1 Parent(s): 5e1ee57

Added submission scripts

Browse files
Files changed (7) hide show
  1. .gitignore +1 -1
  2. Dockerfile +8 -3
  3. README.md +111 -61
  4. inference.py +14 -10
  5. openenv.yaml +5 -0
  6. pyproject.toml +1 -1
  7. start.sh +8 -0
.gitignore CHANGED
@@ -3,7 +3,7 @@ __pycache__/
3
  *.py[codz]
4
  *$py.class
5
  wandb/
6
-
7
  # C extensions
8
  *.so
9
  docs/
 
3
  *.py[codz]
4
  *$py.class
5
  wandb/
6
+ patch_hawk/
7
  # C extensions
8
  *.so
9
  docs/
Dockerfile CHANGED
@@ -19,8 +19,13 @@ COPY pyproject.toml .
19
  COPY inference.py .
20
  COPY config.yaml .
21
 
22
- # Expose the OpenEnv server port
 
 
 
 
23
  EXPOSE 7860
 
24
 
25
- # Launch the OpenEnv HTTP server
26
- CMD ["openenv", "serve", "--env", "patchhawk.agent.environment:PatchHawkEnv", "--port", "7860"]
 
19
  COPY inference.py .
20
  COPY config.yaml .
21
 
22
+ # Copy and configure the startup script
23
+ COPY start.sh .
24
+ RUN chmod +x start.sh
25
+
26
+ # Expose both the OpenEnv API port and Streamlit port
27
  EXPOSE 7860
28
+ EXPOSE 8501
29
 
30
+ # Launch both servers
31
+ CMD ["./start.sh"]
README.md CHANGED
@@ -4,105 +4,155 @@
4
  [![HuggingFace](https://img.shields.io/badge/πŸ€—_Model-patchhawk-yellow)](https://huggingface.co/ramprasathk07/patchhawk)
5
  [![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://python.org)
6
  [![OpenEnv](https://img.shields.io/badge/OpenEnv-Hackathon_Finalist-orange)](https://github.com/pytorch/openenv)
 
7
 
8
- > **PatchHawk is an autonomous DevSecOps agent powered by Group Relative Policy Optimization (GRPO). It doesn't just detect vulnerabilities; it validates them in isolated containers and generates verified patches.**
9
 
10
  ---
11
 
12
- ## πŸš€ The Approach: Cyber-Physical RL Loop
13
-
14
- Most security LLMs suffer from "hallucinated security"β€”they claim a bug is fixed without ever running the code. PatchHawk solves this by implementing a **Cyber-Physical Reinforcement Learning Loop**:
15
-
16
- 1. **Detection**: The agent analyzes code snippets for supply-chain attacks (typosquatting, backdoors, exfiltration).
17
- 2. **Simulation**: The agent can choose to "Detonate" suspicious code in a hardened **Docker Sandbox** to observe real syscalls and network behavior.
18
- 3. **Correction**: If malicious, the agent generates a Python patch.
19
- 4. **Verification**: The environment automatically runs the patch through a 3-stage validation (Syntax -> Unit Tests -> Re-Attack Detonation) inside Docker.
20
- 5. **Reward**: The model is rewarded only if the patch **natively passes** all stages.
 
 
 
 
 
 
 
 
 
 
21
 
22
  ---
23
 
24
- ## 🧠 Training Style: GRPO (Group Relative Policy Optimization)
25
-
26
- PatchHawk uses **GRPO**, the same technique used in DeepSeek-R1, to train our security agent via trial and error.
27
 
28
- - **Trial & Error**: The model is tasked with fixing complex vulnerabilities. It generates multiple attempts (Groups) for the same problem.
29
- - **XML Reasoning**: The model is trained to use absolute XML structure:
30
- ```xml
31
- <thought>Analyze the base64 encoded string... it is a reverse shell.</thought>
32
- <risk_score>0.98</risk_score>
33
- <action>3</action>
34
- <patch>import os...</patch>
35
- ```
36
- - **Relative Scoring**: Instead of using a static "Teacher" model, PatchHawk compares the scores of the 4 attempts against each other. It learns that the attempt that passed the **Docker Syntax Check** is superior to the one that didn't.
37
 
38
  ---
39
 
40
- ## πŸ›  Action Space & Scoring Rubric (0.0 to 1.0 Evaluator)
41
-
42
- The environment manages a complex reward system to move beyond sparse "win/loss" signals.
43
-
44
- | Action ID | Action Name | Reward (Base) | Logic |
45
- | :--- | :--- | :--- | :--- |
46
- | **0** | **ANALYZE** | `0.0` | "Do nothing/Observe". Optimal for benign code. |
47
- | **1** | **EXECUTE_SANDBOX** | `+0.1` | Safely detonate payload in Docker and extract telemetry. |
48
- | **2** | **BLOCK_PR** | `+2.0 / -1.0` | Reject PR. Heavily rewarded for malware, penalized for False Positives. |
49
- | **3** | **SUBMIT_PATCH** | **+3.0 / -1.5** | **The Goal.** Reward requires a clean run in the Docker Sandbox. |
50
- | **4** | **REQUEST_REVIEW** | `0.0` | Escalate to a human expert. |
51
-
52
- ### πŸ’Ž Dynamic Bonuses
53
- * **Risk Accuracy Bonus (+2.0)**: The agent earns a reward of `(1.0 - abs(actual - predicted)) * 2.0`. This ensures it learns to accurately classify risk even if it doesn't take the aggressive patch action.
54
- * **Safety Penalty (-1.0)**: Any patch that fails a Docker syntax check or units tests results in a heavy penalty to discourage "lazy packaging".
 
 
 
 
55
 
56
  ---
57
 
58
- ## 🐳 Docker Usage & Security
59
 
60
- PatchHawk requires a local Docker daemon. The sandbox is strictly isolated:
61
- - **No Network**: Containers run with `--network none`.
62
- - **Resource Caps**: Limited to `256MB RAM` and `0.5 CPU` cores.
63
- - **Non-Root**: Tasks execute as a limited-privilege user.
64
- - **Validation**: The 3-stage pipeline checks:
65
- 1. `py_compile`: Does the patch even run?
66
- 2. `pytest`: Does it break existing functionality?
67
- 3. `Re-Attack`: If we run the original exploit, does the new patch stop it?
68
 
69
- ---
 
 
70
 
71
- ## πŸ“ Installation
72
 
73
  ```bash
74
- # 1. Clone & Install
75
  git clone https://github.com/ramprasathk07/PatchHawk.git
76
  cd PatchHawk
77
- pip install -r requirements.txt
78
 
79
- # 2. Setup Environment
 
 
 
 
 
 
 
 
 
80
  cp .env.example .env
81
- # Fill in HF_TOKEN for local LLM fallback
82
 
83
- # 3. Build the Validator Box
84
  docker build -t patchhawk-sandbox:latest -f docker/Dockerfile.sandbox .
 
 
 
85
 
86
- # 4. Generate the Training Dataset (1,500 samples)
87
- python -m patchhawk.data.generate_scenarios --num-samples 1500
 
 
 
 
88
  ```
89
 
90
  ---
91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
  ## πŸ“ˆ Dashboard & UI
93
 
94
- Launch the **Security Operations Center (SOC)** to watch the agent work in real-time:
95
 
96
  ```bash
97
  streamlit run patchhawk/app/dashboard.py
98
  ```
99
 
100
- Features:
101
- - **Terminal Trace**: See the raw thought process (XML/JSON) of the agent.
102
- - **Docker Telemetry**: View real-time output from the sandbox validation.
103
- - **Reward Signal**: Audit why the agent earned (+/-) rewards for its specific decision.
 
 
 
 
 
 
 
 
104
 
105
  ---
106
 
107
  ## πŸ“ License
108
- MIT Β© Ramprasath K & The PatchHawk Team
 
 
 
 
 
4
  [![HuggingFace](https://img.shields.io/badge/πŸ€—_Model-patchhawk-yellow)](https://huggingface.co/ramprasathk07/patchhawk)
5
  [![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://python.org)
6
  [![OpenEnv](https://img.shields.io/badge/OpenEnv-Hackathon_Finalist-orange)](https://github.com/pytorch/openenv)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
8
 
9
+ > **PatchHawk is an state-of-the-art autonomous DevSecOps agent powered by Group Relative Policy Optimization (GRPO). It goes beyond detection by validating vulnerabilities in isolated Docker sandboxes and generating verified, syntax-correct patches.**
10
 
11
  ---
12
 
13
+ ## πŸ“½οΈ The Vision: Cyber-Physical RL Loop
14
+
15
+ Traditional security scanners often produce high signal-to-noise ratios and "hallucinated" vulnerabilities. PatchHawk bridges this gap by implementing a **Cyber-Physical Reinforcement Learning Loop**, where the model's reward is tied to the actual execution success of its patches in a real environment.
16
+
17
+ ```mermaid
18
+ graph TD
19
+ A[Source Code / PR] --> B{PatchHawk Agent}
20
+ B -->|Analyze| C[Static Analysis]
21
+ B -->|Test| D[Docker Sandbox]
22
+ D -->|Detonate| E[Behavioral Telemetry]
23
+ E --> F[Reward Signal]
24
+ B -->|Patch| G[Verification Pipeline]
25
+ G -->|Syntax Check| H{Success?}
26
+ G -->|Unit Tests| I{Pass?}
27
+ G -->|Re-Attack| J{Defeated?}
28
+ H & I & J -->|All Pass| K[Positive Reward +3.0]
29
+ H | I | J -->|Failure| L[Negative Penalty -1.5]
30
+ K --> M[Model Update/Optimization]
31
+ ```
32
 
33
  ---
34
 
35
+ ## ✨ Key Features
 
 
36
 
37
+ - πŸ›‘οΈ **Autonomous Detection**: Sophisticated analysis of supply-chain vectors (typosquatting, backdoors, exfiltration).
38
+ - 🐳 **Hardened Sandboxing**: High-fidelity Docker isolation with zero-network access and strict resource caps.
39
+ - 🧠 **GRPO-Driven Learning**: Uses Group Relative Policy Optimization (DeepSeek-R1 style) for reasoning and trial-and-error mastery.
40
+ - 🧩 **XML Reasoning**: Enforces a structured `<thought>...</thought>` chain for transparent decision-making.
41
+ - πŸ“Š **SOC Dashboard**: Real-time Streamlit interface for auditing agent behavior and reward telemetry.
42
+ - βœ… **OpenEnv Compliant**: Fully integrated with the [PyTorch OpenEnv](https://github.com/pytorch/openenv) framework.
 
 
 
43
 
44
  ---
45
 
46
+ ## πŸ›  Project Structure
47
+
48
+ The codebase is organized into modular components for training, inference, and environment simulation.
49
+
50
+ ```text
51
+ PatchHawk/
52
+ β”œβ”€β”€ src/envs/patchhawk/ # πŸ“¦ Core OpenEnv Submission Package
53
+ β”‚ β”œβ”€β”€ server/ # FastAPI environment server
54
+ β”‚ β”œβ”€β”€ models.py # Type-safe contract definitions
55
+ β”‚ β”œβ”€β”€ client.py # Environment interaction client
56
+ β”‚ └── inference.py # Main agent execution loop
57
+ β”œβ”€β”€ patchhawk/ # 🧠 Logic & Training
58
+ β”‚ β”œβ”€β”€ data/ # Scenario generation & datasets
59
+ β”‚ β”œβ”€β”€ training/ # GRPO/Unsloth training scripts
60
+ β”‚ └── app/ # Streamlit SOC Dashboard
61
+ β”œβ”€β”€ docker/ # 🐳 Container configurations
62
+ β”œβ”€β”€ config.yaml # Environment & Agent configuration
63
+ └── openenv.yaml # OpenEnv metadata
64
+ ```
65
 
66
  ---
67
 
68
+ ## πŸš€ Getting Started
69
 
70
+ ### Prerequisites
 
 
 
 
 
 
 
71
 
72
+ - **Python 3.12+**
73
+ - **Docker Engine** (running locally)
74
+ - **Nvidia GPU** (8GB+ VRAM recommended for local training/inference)
75
 
76
+ ### 1. Installation
77
 
78
  ```bash
79
+ # Clone the repository
80
  git clone https://github.com/ramprasathk07/PatchHawk.git
81
  cd PatchHawk
 
82
 
83
+ # Create virtual environment and install core dependencies
84
+ python -m venv .venv
85
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
86
+ pip install -e .
87
+ ```
88
+
89
+ ### 2. Environment Setup
90
+
91
+ ```bash
92
+ # Setup environment variables
93
  cp .env.example .env
94
+ # Edit .env to include your HF_TOKEN and OpenAI/Anthropic keys
95
 
96
+ # Build the validation sandbox
97
  docker build -t patchhawk-sandbox:latest -f docker/Dockerfile.sandbox .
98
+ ```
99
+
100
+ ### 3. Running the Agent (Dry Run)
101
 
102
+ ```bash
103
+ # Start the environment server
104
+ python -m server.app --port 8000
105
+
106
+ # Execute the inference loop
107
+ python src/envs/patchhawk/inference.py --env-url http://localhost:8000
108
  ```
109
 
110
  ---
111
 
112
+ ## πŸ’Ž Reward Rubric (Action Space)
113
+
114
+ PatchHawk implements a granular scoring system to guide the agent toward safe and effective decisions.
115
+
116
+ | Action ID | Action Name | Base Reward | Success Criteria |
117
+ | :--- | :--- | :--- | :--- |
118
+ | **0** | `ANALYZE` | `0.0` | Observation step; used for data gathering. |
119
+ | **1** | `DETONATE` | `+0.1` | Successfully extract telemetry from Docker. |
120
+ | **2** | `BLOCK_PR` | `+2.0 / -1.0` | Rewarded for malware; penalized for False Positives. |
121
+ | **3** | `SUBMIT_PATCH` | `+3.0 / -1.5` | **The Goal.** Requires pass in Syntax -> Test -> Re-Attack. |
122
+ | **4** | `ESCALATE` | `0.0` | Hand off to human expert if uncertainty is high. |
123
+
124
+ ### Dynamic Scaling
125
+ - **Risk Accuracy**: Agent receives up to `+2.0` bonus for predicting the exact risk score.
126
+ - **Safety Multiplier**: Frequent failed syntax checks trigger a decay factor on all rewards.
127
+
128
+ ---
129
+
130
  ## πŸ“ˆ Dashboard & UI
131
 
132
+ Launch the **Security Operations Center (SOC)** to watch the agent reason in real-time.
133
 
134
  ```bash
135
  streamlit run patchhawk/app/dashboard.py
136
  ```
137
 
138
+ - **Terminal Trace**: Live XML reasoning logs.
139
+ - **Docker Monitor**: Real-time stdout/stderr from the sandbox.
140
+ - **Reward Audit**: Detailed breakdown of why specific points were awarded.
141
+
142
+ ---
143
+
144
+ ## πŸ—ΊοΈ Roadmap
145
+
146
+ - [ ] **Multi-Agent Coordination**: Deploying "Attacker" vs "Defender" models for automated red-teaming.
147
+ - [ ] **CVE Ingestion**: Automated generation of training scenarios from current NVD databases.
148
+ - [ ] **Cross-Language Support**: Expanding beyond Python to Go, Javascript, and Rust.
149
+ - [ ] **Kubernetes Native**: Orchestrating sandboxes at scale using K8s instead of local Docker.
150
 
151
  ---
152
 
153
  ## πŸ“ License
154
+
155
+ Distributed under the **MIT License**. See `LICENSE` or the project root for more information.
156
+
157
+ Developed with ❀️ by **Ramprasath K & The PatchHawk Team**
158
+ Ramprasath K & The PatchHawk Team
inference.py CHANGED
@@ -35,14 +35,13 @@ try:
35
  except ImportError:
36
  pass
37
 
38
- API_BASE_URL = os.getenv(
39
- "API_BASE_URL", "https://router.huggingface.co/hf-inference/v1"
40
- )
41
- # Prefer explicit MODEL_NAME, fallback to GRPO_POLICY_MODEL from .env, then default to 32B model.
42
- MODEL_NAME = os.getenv("MODEL_NAME", os.getenv("GRPO_POLICY_MODEL", "Qwen/Qwen2.5-Coder-32B-Instruct"))
43
  HF_TOKEN = os.getenv("HF_TOKEN", "")
 
44
  DRY_RUN = os.getenv("DRY_RUN", "0") == "1"
45
  SINGLE_TASK = os.getenv("TASK", "")
 
46
 
47
  TASK_DEFS = [
48
  {
@@ -195,7 +194,7 @@ def _call_llm(messages: list[dict]) -> str:
195
  )
196
  return response.choices[0].message.content or ""
197
  except Exception as e:
198
- print(f"[LLM ERROR] Remote API failed: {e}. Initiating local Fallback...", flush=True)
199
  return _call_llm_local(messages)
200
 
201
 
@@ -260,7 +259,7 @@ def run_episode(
260
  """Run one episode and return summary dict."""
261
  obs = env.reset(task_id=task_id)
262
 
263
- print(f"[START] task={task_id} env=PatchHawk model={MODEL_NAME}")
264
 
265
  trajectory: List[Tuple[PatchHawkAction, PatchHawkObservation]] = []
266
  rewards: List[PatchHawkReward] = []
@@ -298,10 +297,12 @@ def run_episode(
298
 
299
  action_name = PatchHawkEnv.ACTION_NAMES[action.action_type]
300
  _done = str(obs.done).lower()
301
- _err = "null" if error is None else error
 
 
 
302
  print(
303
- f"[STEP] step={step_num} action={action_name} "
304
- f"reward={step_reward.value:.2f} done={_done} error={_err}",
305
  flush=True,
306
  )
307
  error = None # reset for next step
@@ -309,6 +310,9 @@ def run_episode(
309
  # ── Grade ────────────────────────────────────────────────────
310
  score = grader_fn(env, trajectory)
311
 
 
 
 
312
  rewards_str = ",".join(f"{r.value:.2f}" for r in rewards)
313
  success = score >= 1.0
314
  print(
 
35
  except ImportError:
36
  pass
37
 
38
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/hf-inference/v1")
39
+ MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-Coder-32B-Instruct")
 
 
 
40
  HF_TOKEN = os.getenv("HF_TOKEN", "")
41
+ LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME", "patch_hawkv1:latest")
42
  DRY_RUN = os.getenv("DRY_RUN", "0") == "1"
43
  SINGLE_TASK = os.getenv("TASK", "")
44
+ BENCHMARK = os.getenv("BENCHMARK", "PatchHawk")
45
 
46
  TASK_DEFS = [
47
  {
 
194
  )
195
  return response.choices[0].message.content or ""
196
  except Exception as e:
197
+ print(f"[LLM ERROR] Remote API failed: {e}. Initiating local Fallback...", file=sys.stderr, flush=True)
198
  return _call_llm_local(messages)
199
 
200
 
 
259
  """Run one episode and return summary dict."""
260
  obs = env.reset(task_id=task_id)
261
 
262
+ print(f"[START] task={task_id} env={BENCHMARK} model={MODEL_NAME}", flush=True)
263
 
264
  trajectory: List[Tuple[PatchHawkAction, PatchHawkObservation]] = []
265
  rewards: List[PatchHawkReward] = []
 
297
 
298
  action_name = PatchHawkEnv.ACTION_NAMES[action.action_type]
299
  _done = str(obs.done).lower()
300
+ # Sanitize error and action to ensure single-line stdout compliance
301
+ _err = "null" if error is None else str(error).replace("\n", " ")
302
+ _act = str(action_name).replace("\n", " ")
303
+
304
  print(
305
+ f"[STEP] step={step_num} action={_act} reward={step_reward.value:.2f} done={_done} error={_err}",
 
306
  flush=True,
307
  )
308
  error = None # reset for next step
 
310
  # ── Grade ────────────────────────────────────────────────────
311
  score = grader_fn(env, trajectory)
312
 
313
+ # Ensure score is in [0, 1]
314
+ score = min(max(float(score), 0.0), 1.0)
315
+
316
  rewards_str = ",".join(f"{r.value:.2f}" for r in rewards)
317
  success = score >= 1.0
318
  print(
openenv.yaml CHANGED
@@ -1,5 +1,10 @@
1
  name: PatchHawk
2
  version: 1.0.0
 
 
 
 
 
3
  description: Detect and patch supply-chain vulnerabilities in Python code.
4
  tags: [security, supply-chain, code-review, llm-agent]
5
  tasks:
 
1
  name: PatchHawk
2
  version: 1.0.0
3
+ spec_version: 1
4
+ type: space
5
+ runtime: fastapi
6
+ app: server.app:app
7
+ port: 7860
8
  description: Detect and patch supply-chain vulnerabilities in Python code.
9
  tags: [security, supply-chain, code-review, llm-agent]
10
  tasks:
pyproject.toml CHANGED
@@ -44,4 +44,4 @@ dashboard = [
44
  server = "server.app:main"
45
 
46
  [tool.setuptools.packages.find]
47
- include = ["patchhawk*", "server*"]
 
44
  server = "server.app:main"
45
 
46
  [tool.setuptools.packages.find]
47
+ include = ["patchhawk*", "server*", "src*"]
start.sh ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Start the OpenEnv API server (Hackathon Compliance)
3
+ echo "Starting OpenEnv API server on port 7860..."
4
+ uvicorn server.app:app --host 0.0.0.0 --port 7860 &
5
+
6
+ # Start the Streamlit Dashboard (User UI)
7
+ echo "Starting Streamlit Dashboard on port 8501..."
8
+ streamlit run patchhawk/app/dashboard.py --server.port 8501 --server.address 0.0.0.0