Spaces:

Nitishkumar-ai
/

commitguard-env

Sleeping

App Files Files Community

commitguard-env / docs /deployment.md

Nitishkumar-ai

Deployment Build (Final): Professional Structure + Blog

95cbc5b 11 days ago

preview code

raw

history blame contribute delete

8 kB

🚀 CommitGuard — Comprehensive GCP Deployment & Training Guide (A10G)

This document is a deep-dive, step-by-step manual for deploying the CommitGuard environment and training pipeline to a Google Cloud Platform (GCP) instance. We are targeting an NVIDIA A10G GPU to execute GRPO (Group Relative Policy Optimization) on the Llama-3.2-3B model.

📋 1. Prerequisites: Setting Up Your Toolbox

Before you touch the cloud, you must ensure your local environment and external accounts are configured. These are the building blocks of the entire run.

A. GCP Account & Project Setup

Active Project: You must have a GCP project created. Note your PROJECT_ID.
GPU Quota: By default, GCP projects have 0 quota for GPUs. You must navigate to IAM & Admin > Quotas and request a limit increase for NVIDIA_A10G_GPUS in your desired region (e.g., us-central1). Do this 24 hours in advance.

B. Weights & Biases (WandB) for Visualization

Why? RL training can be unstable. WandB allows you to monitor the "Reward" and "KL Divergence" curves in real-time from your browser.
Action: Create a free account at wandb.ai, navigate to your settings, and copy your API Key.

C. Hugging Face Account & Llama Access

Model Gating: Llama-3.2-3B is a gated model. You must visit the model page and apply for access. Approval usually takes 30-60 minutes.
Access Token: Generate a "Write" token in your Hugging Face settings to allow the VM to download the model and upload your finished adapters.

D. Local gcloud CLI Initialization

Installation: Install the Google Cloud SDK on your laptop.
Authentication: Run gcloud auth login and gcloud config set project [YOUR_PROJECT_ID]. This allows your local terminal to "talk" to GCP.

🛠 Step 1: Provisioning the High-Performance VM

We are using the G2 Standard 4 machine. It is specifically designed for AI workloads.

Detailed Breakdown of the Creation Command

--machine-type g2-standard-4: Provides 4 vCPUs and 16GB of system RAM, ensuring the CPU doesn't bottleneck the GPU.
--accelerator type=nvidia-a10g,count=1: Attaches the A10G GPU. Its 24GB of VRAM is the "Goldilocks" zone for 3B parameter models—enough to handle the model plus the multiple "generations" required by the GRPO algorithm.
--image-family common-cu121: Uses a specialized Google image that comes with CUDA 12.1 and NVIDIA drivers pre-installed. This saves you 30 minutes of manual driver installation.
--provisioning-model=SPOT: CRITICAL FOR BUDGET. Spot instances use excess capacity and are ~70% cheaper than standard instances. If the instance is reclaimed by Google, your 50-step checkpoints ensure you don't lose much progress.

gcloud compute instances create commitguard-trainer \
    --project=[PROJECT_ID] \
    --zone=us-central1-a \
    --machine-type=g2-standard-4 \
    --accelerator=count=1,type=nvidia-a10g \
    --image-project=ml-images \
    --image-family=common-cu121 \
    --boot-disk-size=100GB \
    --boot-disk-type=pd-balanced \
    --maintenance-policy=TERMINATE \
    --provisioning-model=SPOT

🏗 Step 2: Environment Preparation

Once the VM is "Running," we need to turn it into a specialized CommitGuard lab.

A. Secure Connection (SSH)

Connect to the machine's terminal:

gcloud compute ssh commitguard-trainer --zone=us-central1-a

B. Repository & Virtual Environment

We isolate our dependencies to prevent conflicts with system-level Python packages.

# Clone the project
git clone https://github.com/[YOUR_USER]/commitguard.git
cd commitguard

# Create a 'venv' (Virtual Environment)
python3 -m venv .venv
source .venv/bin/activate

# Authenticate with Hugging Face (Required for gated Llama models)
huggingface-cli login

C. Installing the "Train" Stack

The -e ".[train]" command installs the commitguard package in "editable" mode along with all optional training libraries like torch, peft, and trl.

pip install -U pip
pip install -e ".[train]"

# Flash Attention 2: This is a specialized kernel that makes Llama training 
# significantly faster and more memory-efficient on A10G hardware.
pip install flash-attn --no-build-isolation

📡 Step 3: Launching the Verifiable Reward Server

CommitGuard uses RLVR. In this setup, the model doesn't just "guess" if it's right; it submits an action to a server that calculates a reward based on hard evidence.

Running in the Background

Since training takes hours, we run the server in the background using the & symbol.

# Start the server
python -m commitguard_env.server &

# Verify Health: This ensures the database and API are ready.
# If this fails, the trainer will hang indefinitely.
curl http://localhost:8000/health
# You should see: {"status":"healthy"}

🧠 Step 4: Executing the GRPO Training Run

GRPO is a "reinforcement learning" algorithm. It asks the model to generate 4 different answers for the same code diff, compares them to each other, and rewards the ones that follow the XML format and correctly identify the vulnerability.

Hyperparameter Explanation

--steps 500: The model will see roughly 2,000 examples (4 generations x 500 steps).
4-bit Quantization: Automatically handled by the script. It "compresses" the model weights so they fit into the GPU's memory without losing accuracy.
LoRA r=8: "Low-Rank Adaptation." Instead of training 3 billion parameters, we only train about 5 million. This makes training stable and fast.
--live: Tells the script to fetch rewards from the server we started in Step 3.

# Login to WandB so your graphs show up online
export WANDB_API_KEY=[YOUR_WANDB_KEY]

python scripts/train_grpo.py \
    --model_name "meta-llama/Llama-3.2-3B-Instruct" \
    --output_dir "./outputs/commitguard-final" \
    --steps 500 \
    --live \
    --wandb "commitguard-rlvr"

💾 Step 5: Post-Run Weight Management & Cleanup

Once the 500 steps are complete, the "brain" of your agent exists as a LoRA adapter in the ./outputs folder.

A. Permanent Storage (Hugging Face)

The VM's disk is temporary. Move your weights to Hugging Face immediately.

huggingface-cli login --token [YOUR_HF_TOKEN]
huggingface-cli upload [HF_USERNAME]/commitguard-llama3b-adapter ./outputs/commitguard-final

B. Cost Control: Deleting the VM

DO NOT FORGET THIS STEP. An idle A10G instance costs money every hour.

# Exit the VM
exit

# Delete from your local terminal
gcloud compute instances delete commitguard-trainer --zone=us-central1-a

🆘 Critical Troubleshooting

"CUDA Out of Memory"

Symptom: Training crashes with a long error ending in OutOfMemoryError.
Fix: The "Group" in GRPO is currently set to 4 generations. Open scripts/train_grpo.py and change num_generations=4 to num_generations=2. This cuts memory usage in half.

"Connection Refused"

Symptom: Reward function returns -1.0 for everything or throws errors.
Fix: Your environment server crashed or wasn't started. Run ps aux | grep server to check if it is still running.

The "Midnight Fallback"

If the 3B model is too slow for the submission deadline:

Switch to the 1.5B Qwen model. It uses the same XML format but is 2x faster.
Command: python scripts/train_grpo.py --model_name "Qwen/Qwen2.5-1.5B-Instruct" ...

✅ Final Success Checklist

Health Check: curl returns healthy.
WandB Tracking: You can see the reward curve moving on the website.
Checkpoints: You see folders like checkpoint-50, checkpoint-100 in the output directory.
Clean Exit: The VM is deleted after the adapter is uploaded to Hugging Face.