Spaces:

Ramkan7
/

Patch_Hawk

Running

App Files Files Community

kanishcr7 commited on Apr 8

Commit

5d79ddf

1 Parent(s): 5e1ee57

Added submission scripts

Browse files

Files changed (7) hide show

.gitignore +1 -1
Dockerfile +8 -3
README.md +111 -61
inference.py +14 -10
openenv.yaml +5 -0
pyproject.toml +1 -1
start.sh +8 -0

.gitignore CHANGED Viewed

@@ -3,7 +3,7 @@ __pycache__/
 *.py[codz]
 *$py.class
 wandb/
 # C extensions
 *.so
 docs/

 *.py[codz]
 *$py.class
 wandb/
+patch_hawk/
 # C extensions
 *.so
 docs/

Dockerfile CHANGED Viewed

@@ -19,8 +19,13 @@ COPY pyproject.toml .
 COPY inference.py .
 COPY config.yaml .
-# Expose the OpenEnv server port
 EXPOSE 7860
-# Launch the OpenEnv HTTP server
-CMD ["openenv", "serve", "--env", "patchhawk.agent.environment:PatchHawkEnv", "--port", "7860"]

 COPY inference.py .
 COPY config.yaml .
+# Copy and configure the startup script
+COPY start.sh .
+RUN chmod +x start.sh
+# Expose both the OpenEnv API port and Streamlit port
 EXPOSE 7860
+EXPOSE 8501
+# Launch both servers
+CMD ["./start.sh"]

README.md CHANGED Viewed

@@ -4,105 +4,155 @@
 [![HuggingFace](https://img.shields.io/badge/🤗_Model-patchhawk-yellow)](https://huggingface.co/ramprasathk07/patchhawk)
 [![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://python.org)
 [![OpenEnv](https://img.shields.io/badge/OpenEnv-Hackathon_Finalist-orange)](https://github.com/pytorch/openenv)
-> **PatchHawk is an autonomous DevSecOps agent powered by Group Relative Policy Optimization (GRPO). It doesn't just detect vulnerabilities; it validates them in isolated containers and generates verified patches.**
 ---
-## 🚀 The Approach: Cyber-Physical RL Loop
-Most security LLMs suffer from "hallucinated security"—they claim a bug is fixed without ever running the code. PatchHawk solves this by implementing a **Cyber-Physical Reinforcement Learning Loop**:
-1.  **Detection**: The agent analyzes code snippets for supply-chain attacks (typosquatting, backdoors, exfiltration).
-2.  **Simulation**: The agent can choose to "Detonate" suspicious code in a hardened **Docker Sandbox** to observe real syscalls and network behavior.
-3.  **Correction**: If malicious, the agent generates a Python patch.
-4.  **Verification**: The environment automatically runs the patch through a 3-stage validation (Syntax -> Unit Tests -> Re-Attack Detonation) inside Docker.
-5.  **Reward**: The model is rewarded only if the patch **natively passes** all stages.
 ---
-## 🧠 Training Style: GRPO (Group Relative Policy Optimization)
-PatchHawk uses **GRPO**, the same technique used in DeepSeek-R1, to train our security agent via trial and error.
--   **Trial & Error**: The model is tasked with fixing complex vulnerabilities. It generates multiple attempts (Groups) for the same problem.
--   **XML Reasoning**: The model is trained to use absolute XML structure:
-    ```xml
-    <thought>Analyze the base64 encoded string... it is a reverse shell.</thought>
-    <risk_score>0.98</risk_score>
-    <action>3</action>
-    <patch>import os...</patch>
-    ```
--   **Relative Scoring**: Instead of using a static "Teacher" model, PatchHawk compares the scores of the 4 attempts against each other. It learns that the attempt that passed the **Docker Syntax Check** is superior to the one that didn't.
 ---
-## 🛠 Action Space & Scoring Rubric (0.0 to 1.0 Evaluator)
-The environment manages a complex reward system to move beyond sparse "win/loss" signals.
-| Action ID | Action Name | Reward (Base) | Logic |
-| :--- | :--- | :--- | :--- |
-| **0** | **ANALYZE** | `0.0` | "Do nothing/Observe". Optimal for benign code. |
-| **1** | **EXECUTE_SANDBOX** | `+0.1` | Safely detonate payload in Docker and extract telemetry. |
-| **2** | **BLOCK_PR** | `+2.0 / -1.0` | Reject PR. Heavily rewarded for malware, penalized for False Positives. |
-| **3** | **SUBMIT_PATCH** | **+3.0 / -1.5** | **The Goal.** Reward requires a clean run in the Docker Sandbox. |
-| **4** | **REQUEST_REVIEW** | `0.0` | Escalate to a human expert. |
-### 💎 Dynamic Bonuses
-*   **Risk Accuracy Bonus (+2.0)**: The agent earns a reward of `(1.0 - abs(actual - predicted)) * 2.0`. This ensures it learns to accurately classify risk even if it doesn't take the aggressive patch action.
-*   **Safety Penalty (-1.0)**: Any patch that fails a Docker syntax check or units tests results in a heavy penalty to discourage "lazy packaging".
 ---
-## 🐳 Docker Usage & Security
-PatchHawk requires a local Docker daemon. The sandbox is strictly isolated:
--   **No Network**: Containers run with `--network none`.
--   **Resource Caps**: Limited to `256MB RAM` and `0.5 CPU` cores.
--   **Non-Root**: Tasks execute as a limited-privilege user.
--   **Validation**: The 3-stage pipeline checks:
-    1.  `py_compile`: Does the patch even run?
-    2.  `pytest`: Does it break existing functionality?
-    3.  `Re-Attack`: If we run the original exploit, does the new patch stop it?
----
-## 📁 Installation
 ```bash
-# 1. Clone & Install
 git clone https://github.com/ramprasathk07/PatchHawk.git
 cd PatchHawk
-pip install -r requirements.txt
-# 2. Setup Environment
 cp .env.example .env
-# Fill in HF_TOKEN for local LLM fallback
-# 3. Build the Validator Box
 docker build -t patchhawk-sandbox:latest -f docker/Dockerfile.sandbox .
-# 4. Generate the Training Dataset (1,500 samples)
-python -m patchhawk.data.generate_scenarios --num-samples 1500
 ```
 ---
 ## 📈 Dashboard & UI
-Launch the **Security Operations Center (SOC)** to watch the agent work in real-time:
 ```bash
 streamlit run patchhawk/app/dashboard.py
 ```
-Features:
-- **Terminal Trace**: See the raw thought process (XML/JSON) of the agent.
-- **Docker Telemetry**: View real-time output from the sandbox validation.
-- **Reward Signal**: Audit why the agent earned (+/-) rewards for its specific decision.
 ---
 ## 📝 License
-MIT © Ramprasath K & The PatchHawk Team

 [![HuggingFace](https://img.shields.io/badge/🤗_Model-patchhawk-yellow)](https://huggingface.co/ramprasathk07/patchhawk)
 [![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://python.org)
 [![OpenEnv](https://img.shields.io/badge/OpenEnv-Hackathon_Finalist-orange)](https://github.com/pytorch/openenv)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+> **PatchHawk is an state-of-the-art autonomous DevSecOps agent powered by Group Relative Policy Optimization (GRPO). It goes beyond detection by validating vulnerabilities in isolated Docker sandboxes and generating verified, syntax-correct patches.**
 ---
+## 📽️ The Vision: Cyber-Physical RL Loop
+Traditional security scanners often produce high signal-to-noise ratios and "hallucinated" vulnerabilities. PatchHawk bridges this gap by implementing a **Cyber-Physical Reinforcement Learning Loop**, where the model's reward is tied to the actual execution success of its patches in a real environment.
+```mermaid
+graph TD
+    A[Source Code / PR] --> B{PatchHawk Agent}
+    B -->|Analyze| C[Static Analysis]
+    B -->|Test| D[Docker Sandbox]
+    D -->|Detonate| E[Behavioral Telemetry]
+    E --> F[Reward Signal]
+    B -->|Patch| G[Verification Pipeline]
+    G -->|Syntax Check| H{Success?}
+    G -->|Unit Tests| I{Pass?}
+    G -->|Re-Attack| J{Defeated?}
+    H & I & J -->|All Pass| K[Positive Reward +3.0]
+    H | I | J -->|Failure| L[Negative Penalty -1.5]
+    K --> M[Model Update/Optimization]
+```
 ---
+## ✨ Key Features
+-   🛡️ **Autonomous Detection**: Sophisticated analysis of supply-chain vectors (typosquatting, backdoors, exfiltration).
+-   🐳 **Hardened Sandboxing**: High-fidelity Docker isolation with zero-network access and strict resource caps.
+-   🧠 **GRPO-Driven Learning**: Uses Group Relative Policy Optimization (DeepSeek-R1 style) for reasoning and trial-and-error mastery.
+-   🧩 **XML Reasoning**: Enforces a structured `<thought>...</thought>` chain for transparent decision-making.
+-   📊 **SOC Dashboard**: Real-time Streamlit interface for auditing agent behavior and reward telemetry.
+-   ✅ **OpenEnv Compliant**: Fully integrated with the [PyTorch OpenEnv](https://github.com/pytorch/openenv) framework.
 ---
+## 🛠 Project Structure
+The codebase is organized into modular components for training, inference, and environment simulation.
+```text
+PatchHawk/
+├── src/envs/patchhawk/    # 📦 Core OpenEnv Submission Package
+│   ├── server/            # FastAPI environment server
+│   ├── models.py          # Type-safe contract definitions
+│   ├── client.py          # Environment interaction client
+│   └── inference.py       # Main agent execution loop
+├── patchhawk/             # 🧠 Logic & Training
+│   ├── data/              # Scenario generation & datasets
+│   ├── training/          # GRPO/Unsloth training scripts
+│   └── app/               # Streamlit SOC Dashboard
+├── docker/                # 🐳 Container configurations
+├── config.yaml            # Environment & Agent configuration
+└── openenv.yaml           # OpenEnv metadata
+```
 ---
+## 🚀 Getting Started
+### Prerequisites
+-   **Python 3.12+**
+-   **Docker Engine** (running locally)
+-   **Nvidia GPU** (8GB+ VRAM recommended for local training/inference)
+### 1. Installation
 ```bash
+# Clone the repository
 git clone https://github.com/ramprasathk07/PatchHawk.git
 cd PatchHawk
+# Create virtual environment and install core dependencies
+python -m venv .venv
+source .venv/bin/activate  # Windows: .venv\Scripts\activate
+pip install -e .
+```
+### 2. Environment Setup
+```bash
+# Setup environment variables
 cp .env.example .env
+# Edit .env to include your HF_TOKEN and OpenAI/Anthropic keys
+# Build the validation sandbox
 docker build -t patchhawk-sandbox:latest -f docker/Dockerfile.sandbox .
+```
+### 3. Running the Agent (Dry Run)
+```bash
+# Start the environment server
+python -m server.app --port 8000
+# Execute the inference loop
+python src/envs/patchhawk/inference.py --env-url http://localhost:8000
 ```
 ---
+## 💎 Reward Rubric (Action Space)
+PatchHawk implements a granular scoring system to guide the agent toward safe and effective decisions.
+| Action ID | Action Name | Base Reward | Success Criteria |
+| :--- | :--- | :--- | :--- |
+| **0** | `ANALYZE` | `0.0` | Observation step; used for data gathering. |
+| **1** | `DETONATE` | `+0.1` | Successfully extract telemetry from Docker. |
+| **2** | `BLOCK_PR` | `+2.0 / -1.0` | Rewarded for malware; penalized for False Positives. |
+| **3** | `SUBMIT_PATCH` | `+3.0 / -1.5` | **The Goal.** Requires pass in Syntax -> Test -> Re-Attack. |
+| **4** | `ESCALATE` | `0.0` | Hand off to human expert if uncertainty is high. |
+### Dynamic Scaling
+-   **Risk Accuracy**: Agent receives up to `+2.0` bonus for predicting the exact risk score.
+-   **Safety Multiplier**: Frequent failed syntax checks trigger a decay factor on all rewards.
+---
 ## 📈 Dashboard & UI
+Launch the **Security Operations Center (SOC)** to watch the agent reason in real-time.
 ```bash
 streamlit run patchhawk/app/dashboard.py
 ```
+-   **Terminal Trace**: Live XML reasoning logs.
+-   **Docker Monitor**: Real-time stdout/stderr from the sandbox.
+-   **Reward Audit**: Detailed breakdown of why specific points were awarded.
+---
+## 🗺️ Roadmap
+-   [ ] **Multi-Agent Coordination**: Deploying "Attacker" vs "Defender" models for automated red-teaming.
+-   [ ] **CVE Ingestion**: Automated generation of training scenarios from current NVD databases.
+-   [ ] **Cross-Language Support**: Expanding beyond Python to Go, Javascript, and Rust.
+-   [ ] **Kubernetes Native**: Orchestrating sandboxes at scale using K8s instead of local Docker.
 ---
 ## 📝 License
+Distributed under the **MIT License**. See `LICENSE` or the project root for more information.
+Developed with ❤️ by **Ramprasath K & The PatchHawk Team**
+Ramprasath K & The PatchHawk Team

inference.py CHANGED Viewed

@@ -35,14 +35,13 @@ try:
 except ImportError:
     pass
-API_BASE_URL = os.getenv(
-    "API_BASE_URL", "https://router.huggingface.co/hf-inference/v1"
-)
-# Prefer explicit MODEL_NAME, fallback to GRPO_POLICY_MODEL from .env, then default to 32B model.
-MODEL_NAME = os.getenv("MODEL_NAME", os.getenv("GRPO_POLICY_MODEL", "Qwen/Qwen2.5-Coder-32B-Instruct"))
 HF_TOKEN = os.getenv("HF_TOKEN", "")
 DRY_RUN = os.getenv("DRY_RUN", "0") == "1"
 SINGLE_TASK = os.getenv("TASK", "")
 TASK_DEFS = [
     {
@@ -195,7 +194,7 @@ def _call_llm(messages: list[dict]) -> str:
         )
         return response.choices[0].message.content or ""
     except Exception as e:
-        print(f"[LLM ERROR] Remote API failed: {e}. Initiating local Fallback...", flush=True)
         return _call_llm_local(messages)
@@ -260,7 +259,7 @@ def run_episode(
     """Run one episode and return summary dict."""
     obs = env.reset(task_id=task_id)
-    print(f"[START] task={task_id} env=PatchHawk model={MODEL_NAME}")
     trajectory: List[Tuple[PatchHawkAction, PatchHawkObservation]] = []
     rewards: List[PatchHawkReward] = []
@@ -298,10 +297,12 @@ def run_episode(
         action_name = PatchHawkEnv.ACTION_NAMES[action.action_type]
         _done = str(obs.done).lower()
-        _err = "null" if error is None else error
         print(
-            f"[STEP] step={step_num} action={action_name} "
-            f"reward={step_reward.value:.2f} done={_done} error={_err}",
             flush=True,
         )
         error = None  # reset for next step
@@ -309,6 +310,9 @@ def run_episode(
     # ── Grade ────────────────────────────────────────────────────
     score = grader_fn(env, trajectory)
     rewards_str = ",".join(f"{r.value:.2f}" for r in rewards)
     success = score >= 1.0
     print(

 except ImportError:
     pass
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/hf-inference/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-Coder-32B-Instruct")
 HF_TOKEN = os.getenv("HF_TOKEN", "")
+LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME", "patch_hawkv1:latest")
 DRY_RUN = os.getenv("DRY_RUN", "0") == "1"
 SINGLE_TASK = os.getenv("TASK", "")
+BENCHMARK = os.getenv("BENCHMARK", "PatchHawk")
 TASK_DEFS = [
     {
         )
         return response.choices[0].message.content or ""
     except Exception as e:
+        print(f"[LLM ERROR] Remote API failed: {e}. Initiating local Fallback...", file=sys.stderr, flush=True)
         return _call_llm_local(messages)
     """Run one episode and return summary dict."""
     obs = env.reset(task_id=task_id)
+    print(f"[START] task={task_id} env={BENCHMARK} model={MODEL_NAME}", flush=True)
     trajectory: List[Tuple[PatchHawkAction, PatchHawkObservation]] = []
     rewards: List[PatchHawkReward] = []
         action_name = PatchHawkEnv.ACTION_NAMES[action.action_type]
         _done = str(obs.done).lower()
+        # Sanitize error and action to ensure single-line stdout compliance
+        _err = "null" if error is None else str(error).replace("\n", " ")
+        _act = str(action_name).replace("\n", " ")
         print(
+            f"[STEP] step={step_num} action={_act} reward={step_reward.value:.2f} done={_done} error={_err}",
             flush=True,
         )
         error = None  # reset for next step
     # ── Grade ────────────────────────────────────────────────────
     score = grader_fn(env, trajectory)
+    # Ensure score is in [0, 1]
+    score = min(max(float(score), 0.0), 1.0)
     rewards_str = ",".join(f"{r.value:.2f}" for r in rewards)
     success = score >= 1.0
     print(

openenv.yaml CHANGED Viewed

@@ -1,5 +1,10 @@
 name: PatchHawk
 version: 1.0.0
 description: Detect and patch supply-chain vulnerabilities in Python code.
 tags: [security, supply-chain, code-review, llm-agent]
 tasks:

 name: PatchHawk
 version: 1.0.0
+spec_version: 1
+type: space
+runtime: fastapi
+app: server.app:app
+port: 7860
 description: Detect and patch supply-chain vulnerabilities in Python code.
 tags: [security, supply-chain, code-review, llm-agent]
 tasks:

pyproject.toml CHANGED Viewed

@@ -44,4 +44,4 @@ dashboard = [
 server = "server.app:main"
 [tool.setuptools.packages.find]
-include = ["patchhawk*", "server*"]

 server = "server.app:main"
 [tool.setuptools.packages.find]
+include = ["patchhawk*", "server*", "src*"]

start.sh ADDED Viewed

	@@ -0,0 +1,8 @@

+#!/bin/bash
+# Start the OpenEnv API server (Hackathon Compliance)
+echo "Starting OpenEnv API server on port 7860..."
+uvicorn server.app:app --host 0.0.0.0 --port 7860 &
+# Start the Streamlit Dashboard (User UI)
+echo "Starting Streamlit Dashboard on port 8501..."
+streamlit run patchhawk/app/dashboard.py --server.port 8501 --server.address 0.0.0.0