Instructions to use oxdev/security-auditor-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use oxdev/security-auditor-grpo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="oxdev/security-auditor-grpo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("oxdev/security-auditor-grpo")
model = AutoModelForCausalLM.from_pretrained("oxdev/security-auditor-grpo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use oxdev/security-auditor-grpo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "oxdev/security-auditor-grpo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oxdev/security-auditor-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/oxdev/security-auditor-grpo

SGLang

How to use oxdev/security-auditor-grpo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "oxdev/security-auditor-grpo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oxdev/security-auditor-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "oxdev/security-auditor-grpo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oxdev/security-auditor-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use oxdev/security-auditor-grpo with Docker Model Runner:
```
docker model run hf.co/oxdev/security-auditor-grpo
```

oxdev commited on 16 days ago

Commit

55ef8ec

verified ·

1 Parent(s): 3c818d7

Add Google Colab training notebook for V2 GRPO training (free T4 path)

Browse files

Files changed (1) hide show

train_grpo_v2_colab.ipynb +482 -0

train_grpo_v2_colab.ipynb ADDED Viewed

	@@ -0,0 +1,482 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# 🔐 Smart Contract Security Auditor — GRPO V2 Training\n",
+        "\n",
+        "Train a specialized smart contract security auditor using **Group Relative Policy Optimization (GRPO)**\n",
+        "on **50,902 real audit findings** from top security firms.\n",
+        "\n",
+        "**Model:** Qwen2.5-Coder-0.5B-Instruct → oxdev/security-auditor-grpo\n",
+        "\n",
+        "**Dataset:** [oxdev/smart-contract-security-audit-v2](https://huggingface.co/datasets/oxdev/smart-contract-security-audit-v2)\n",
+        "\n",
+        "**Hardware:** Free Colab T4 (16GB VRAM)\n",
+        "\n",
+        "---\n",
+        "\n",
+        "## Setup\n",
+        "1. Go to **Runtime → Change runtime type → T4 GPU**\n",
+        "2. Run all cells in order\n",
+        "3. When prompted, enter your HuggingFace token (needs write access)\n",
+        "4. Training takes ~4-6 hours on a T4 GPU with 2K samples"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 1: Install dependencies\n",
+        "!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121\n",
+        "!pip install -q transformers>=4.51.0 trl>=1.2.0 datasets accelerate huggingface_hub\n",
+        "print('\\n✅ Dependencies installed!')\n",
+        "\n",
+        "import torch\n",
+        "print(f'PyTorch: {torch.__version__}')\n",
+        "print(f'CUDA available: {torch.cuda.is_available()}')\n",
+        "if torch.cuda.is_available():\n",
+        "    print(f'GPU: {torch.cuda.get_device_name(0)}')\n",
+        "    print(f'VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 2: Login to HuggingFace (needed to push model)\n",
+        "from huggingface_hub import login\n",
+        "login()  # Will prompt for your token"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 3: Configuration\n",
+        "# ╔══════════════════════════════════════════════════════════════╗\n",
+        "# ║  MODIFY THESE SETTINGS AS NEEDED                           ║\n",
+        "# ╚══════════════════════════════════════════════════════════════╝\n",
+        "\n",
+        "MODEL_NAME = \"Qwen/Qwen2.5-Coder-0.5B-Instruct\"  # Base model\n",
+        "DATASET_ID = \"oxdev/smart-contract-security-audit-v2\"  # 50K real findings\n",
+        "HUB_MODEL_ID = \"oxdev/security-auditor-grpo\"  # Where to push\n",
+        "OUTPUT_DIR = \"/content/grpo_v2_output\"  # Local output\n",
+        "\n",
+        "# Training hyperparameters (tuned for T4 16GB)\n",
+        "SUBSET_SIZE = 2000           # Samples to train on (2K fits in ~4hrs on T4)\n",
+        "BATCH_SIZE = 2               # Per-device batch size\n",
+        "GRAD_ACCUM = 4               # Gradient accumulation → effective batch = 8\n",
+        "NUM_GENERATIONS = 2          # GRPO generations per prompt\n",
+        "MAX_COMPLETION_LENGTH = 512  # Max tokens per completion\n",
+        "LEARNING_RATE = 1e-6\n",
+        "BETA = 0.04                  # KL penalty\n",
+        "NUM_EPOCHS = 1\n",
+        "SAVE_STEPS = 100\n",
+        "\n",
+        "print(f'Config ready: {SUBSET_SIZE} samples, batch={BATCH_SIZE}×{GRAD_ACCUM}, lr={LEARNING_RATE}')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 4: Load and inspect dataset\n",
+        "from datasets import load_dataset\n",
+        "from collections import Counter\n",
+        "\n",
+        "print('Loading dataset...')\n",
+        "dataset = load_dataset(DATASET_ID, split='train')\n",
+        "print(f'Total: {len(dataset)} samples')\n",
+        "print(f'Columns: {dataset.column_names}')\n",
+        "print()\n",
+        "\n",
+        "# Show distributions\n",
+        "sev_dist = Counter(dataset['severity'])\n",
+        "cat_dist = Counter(dataset['category'])\n",
+        "src_dist = Counter(dataset['source'])\n",
+        "\n",
+        "print('Severity distribution:')\n",
+        "for sev, count in sorted(sev_dist.items(), key=lambda x: -x[1]):\n",
+        "    print(f'  {sev:15s}: {count:6d} ({count/len(dataset)*100:.1f}%)')\n",
+        "\n",
+        "print(f'\\nCategory distribution (top 10):')\n",
+        "for cat, count in sorted(cat_dist.items(), key=lambda x: -x[1])[:10]:\n",
+        "    print(f'  {cat:20s}: {count:6d}')\n",
+        "\n",
+        "print(f'\\nSource distribution:')\n",
+        "for src, count in sorted(src_dist.items(), key=lambda x: -x[1]):\n",
+        "    print(f'  {src:20s}: {count:6d}')\n",
+        "\n",
+        "# Show a sample\n",
+        "print(f'\\n--- Sample prompt (first 300 chars) ---')\n",
+        "p = dataset[0]['prompt']\n",
+        "user_msg = [m for m in p if m['role'] == 'user'][0]['content']\n",
+        "print(user_msg[:300])"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 5: Curate high-quality training subset\n",
+        "print(f'Selecting top {SUBSET_SIZE} highest-value samples...')\n",
+        "\n",
+        "indices = []\n",
+        "idx_set = set()\n",
+        "\n",
+        "# Priority 1: HIGH+CRITICAL severity with code (most valuable)\n",
+        "for i, row in enumerate(dataset):\n",
+        "    if row['severity'] in ('high', 'critical') and row['has_code']:\n",
+        "        indices.append(i)\n",
+        "        idx_set.add(i)\n",
+        "print(f'  HIGH+CRITICAL with code: {len(indices)}')\n",
+        "\n",
+        "# Priority 2: Any with PoC reference\n",
+        "for i, row in enumerate(dataset):\n",
+        "    if row['has_poc'] and i not in idx_set:\n",
+        "        indices.append(i)\n",
+        "        idx_set.add(i)\n",
+        "print(f'  + Has PoC: {len(indices)}')\n",
+        "\n",
+        "# Priority 3: MEDIUM with code (fill to cap)\n",
+        "for i, row in enumerate(dataset):\n",
+        "    if row['severity'] == 'medium' and row['has_code'] and i not in idx_set:\n",
+        "        indices.append(i)\n",
+        "        idx_set.add(i)\n",
+        "    if len(indices) >= SUBSET_SIZE:\n",
+        "        break\n",
+        "\n",
+        "# If still short, add remaining HIGH+CRITICAL without code\n",
+        "if len(indices) < SUBSET_SIZE:\n",
+        "    for i, row in enumerate(dataset):\n",
+        "        if row['severity'] in ('high', 'critical') and i not in idx_set:\n",
+        "            indices.append(i)\n",
+        "            idx_set.add(i)\n",
+        "        if len(indices) >= SUBSET_SIZE:\n",
+        "            break\n",
+        "\n",
+        "train_dataset = dataset.select(indices[:SUBSET_SIZE])\n",
+        "print(f'\\n✅ Final subset: {len(train_dataset)} samples')\n",
+        "\n",
+        "# Show final distribution\n",
+        "final_sev = Counter(train_dataset['severity'])\n",
+        "for sev, count in sorted(final_sev.items(), key=lambda x: -x[1]):\n",
+        "    print(f'  {sev:15s}: {count:6d}')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 6: Define reward functions\n",
+        "import re\n",
+        "\n",
+        "def format_reward(prompts, completions, completion_ids=None, **kwargs):\n",
+        "    \"\"\"Reward for producing structured FINDING blocks and proper formatting.\"\"\"\n",
+        "    rewards = []\n",
+        "    for completion in completions:\n",
+        "        text = completion[0]['content'] if isinstance(completion, list) else str(completion)\n",
+        "        reward = 0.0\n",
+        "        if re.search(r'FINDING\\s*\\|', text):\n",
+        "            reward += 0.3\n",
+        "            fields = ['contract:', 'function:', 'bug_class:', 'confidence:']\n",
+        "            reward += 0.05 * sum(1 for f in fields if f in text)\n",
+        "        if re.search(r'```solidity', text):\n",
+        "            reward += 0.15\n",
+        "        section_keywords = ['description', 'impact', 'proof', 'fix', 'recommendation', 'mitigation']\n",
+        "        sect_count = sum(1 for kw in section_keywords if re.search(rf'(?i)(###?\\s*{kw}|{kw}:)', text))\n",
+        "        reward += 0.05 * min(sect_count, 3)\n",
+        "        if len(text) < 50: reward -= 0.3\n",
+        "        elif len(text) > 4000: reward -= 0.1\n",
+        "        rewards.append(max(-1.0, min(1.0, reward)))\n",
+        "    return rewards\n",
+        "\n",
+        "\n",
+        "def _sev_rank(sev):\n",
+        "    return {'critical': 5, 'high': 4, 'medium': 3, 'low': 2, 'informational': 1, 'gas': 0}.get(sev, -1)\n",
+        "\n",
+        "def severity_reward(prompts, completions, completion_ids=None, severity=None, **kwargs):\n",
+        "    \"\"\"Reward for correctly identifying the severity level.\"\"\"\n",
+        "    rewards = []\n",
+        "    if severity is None:\n",
+        "        return [0.0] * len(completions)\n",
+        "    sev_list = severity if isinstance(severity, list) else [severity] * len(completions)\n",
+        "    for i, completion in enumerate(completions):\n",
+        "        text = completion[0]['content'] if isinstance(completion, list) else str(completion)\n",
+        "        gt_sev = sev_list[i] if i < len(sev_list) else 'unknown'\n",
+        "        if gt_sev == 'unknown':\n",
+        "            rewards.append(0.0); continue\n",
+        "        sev_match = re.search(r'(?i)(critical|high|medium|low|informational|gas)', text.lower())\n",
+        "        if not sev_match:\n",
+        "            rewards.append(-0.3)\n",
+        "        else:\n",
+        "            pred = sev_match.group(1).lower()\n",
+        "            diff = abs(_sev_rank(pred) - _sev_rank(gt_sev))\n",
+        "            rewards.append(1.0 if diff == 0 else 0.3 if diff == 1 else -0.5)\n",
+        "    return rewards\n",
+        "\n",
+        "\n",
+        "CATEGORY_KEYWORDS = {\n",
+        "    'reentrancy': ['reentrancy', 'reentrant', 're-enter', 'callback'],\n",
+        "    'access-control': ['access control', 'unauthorized', 'permission', 'onlyowner', 'role', 'privilege'],\n",
+        "    'oracle': ['oracle', 'price feed', 'chainlink', 'twap', 'price manipulation'],\n",
+        "    'flash-loan': ['flash loan', 'flashloan'],\n",
+        "    'overflow': ['overflow', 'underflow', 'arithmetic'],\n",
+        "    'front-running': ['front-run', 'frontrun', 'sandwich', 'mev'],\n",
+        "    'dos': ['denial of service', 'dos', 'gas limit', 'unbounded', 'out of gas'],\n",
+        "    'token': ['erc20', 'erc721', 'token', 'fee-on-transfer', 'rebasing'],\n",
+        "    'storage': ['storage collision', 'delegatecall', 'proxy', 'slot'],\n",
+        "    'cross-chain': ['bridge', 'cross-chain', 'relay', 'message passing'],\n",
+        "    'liquidation': ['liquidation', 'collateral', 'health factor'],\n",
+        "    'signature': ['signature', 'ecrecover', 'replay', 'nonce', 'eip712'],\n",
+        "    'initialization': ['initialize', 'constructor', 'uninitialized'],\n",
+        "    'rounding': ['rounding', 'precision', 'truncation', 'decimal'],\n",
+        "    'logic': ['logic error', 'incorrect calculation', 'business logic'],\n",
+        "}\n",
+        "\n",
+        "def category_reward(prompts, completions, completion_ids=None, category=None, **kwargs):\n",
+        "    \"\"\"Reward for identifying the correct vulnerability category.\"\"\"\n",
+        "    rewards = []\n",
+        "    if category is None:\n",
+        "        return [0.0] * len(completions)\n",
+        "    cat_list = category if isinstance(category, list) else [category] * len(completions)\n",
+        "    for i, completion in enumerate(completions):\n",
+        "        text = completion[0]['content'] if isinstance(completion, list) else str(completion)\n",
+        "        gt_cat = cat_list[i] if i < len(cat_list) else 'other'\n",
+        "        if gt_cat in ('other', 'unknown'):\n",
+        "            rewards.append(0.0); continue\n",
+        "        gt_keywords = CATEGORY_KEYWORDS.get(gt_cat, [])\n",
+        "        if not gt_keywords:\n",
+        "            rewards.append(0.0); continue\n",
+        "        hits = sum(1 for kw in gt_keywords if kw in text.lower())\n",
+        "        if hits >= 2: rewards.append(1.0)\n",
+        "        elif hits == 1: rewards.append(0.5)\n",
+        "        else:\n",
+        "            any_hit = any(kw in text.lower() for kws in CATEGORY_KEYWORDS.values() for kw in kws)\n",
+        "            rewards.append(-0.2 if any_hit else -0.5)\n",
+        "    return rewards\n",
+        "\n",
+        "\n",
+        "def quality_reward(prompts, completions, completion_ids=None, **kwargs):\n",
+        "    \"\"\"Reward for overall response quality: technical depth, actionability.\"\"\"\n",
+        "    rewards = []\n",
+        "    for completion in completions:\n",
+        "        text = completion[0]['content'] if isinstance(completion, list) else str(completion)\n",
+        "        reward = 0.0\n",
+        "        technical_terms = [\n",
+        "            'msg.sender', 'tx.origin', 'delegatecall', 'selfdestruct',\n",
+        "            'transfer', 'call.value', 'abi.encode', 'keccak256',\n",
+        "            'require(', 'assert(', 'revert', 'mapping', 'storage',\n",
+        "            'memory', 'calldata', 'modifier', 'interface', 'pragma',\n",
+        "            'assembly', 'unchecked', 'payable', 'receive()', 'fallback()',\n",
+        "        ]\n",
+        "        reward += min(0.3, 0.03 * sum(1 for t in technical_terms if t in text))\n",
+        "        reasoning = ['because', 'therefore', 'this means', 'as a result',\n",
+        "                     'the attacker can', 'this allows', 'leading to',\n",
+        "                     'step 1', 'step 2', 'first,', 'then,', 'finally,']\n",
+        "        reward += min(0.3, 0.06 * sum(1 for r in reasoning if r.lower() in text.lower()))\n",
+        "        fix_ind = ['fix:', 'recommendation:', 'mitigation:', 'should', 'consider', 'instead']\n",
+        "        reward += min(0.2, 0.05 * sum(1 for f in fix_ind if f.lower() in text.lower()))\n",
+        "        if re.search(r'line\\s+\\d+|L\\d+|#L\\d+', text): reward += 0.1\n",
+        "        if re.search(r'function\\s+\\w+\\s*\\(', text): reward += 0.1\n",
+        "        generic = ['i cannot', \"i don't\", 'no vulnerabilities found', 'the code looks safe']\n",
+        "        if any(p in text.lower() for p in generic): reward -= 0.5\n",
+        "        rewards.append(max(-1.0, min(1.0, reward)))\n",
+        "    return rewards\n",
+        "\n",
+        "print('✅ 4 reward functions defined: format, severity, category, quality')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 7: Initialize GRPO Trainer\n",
+        "from trl import GRPOTrainer, GRPOConfig\n",
+        "\n",
+        "config = GRPOConfig(\n",
+        "    output_dir=OUTPUT_DIR,\n",
+        "    num_train_epochs=NUM_EPOCHS,\n",
+        "    per_device_train_batch_size=BATCH_SIZE,\n",
+        "    gradient_accumulation_steps=GRAD_ACCUM,\n",
+        "    num_generations=NUM_GENERATIONS,\n",
+        "    max_completion_length=MAX_COMPLETION_LENGTH,\n",
+        "    learning_rate=LEARNING_RATE,\n",
+        "    beta=BETA,\n",
+        "    scale_rewards=True,\n",
+        "    reward_weights=[0.25, 0.25, 0.25, 0.25],\n",
+        "    gradient_checkpointing=True,\n",
+        "    bf16=True,\n",
+        "    logging_steps=10,\n",
+        "    logging_first_step=True,\n",
+        "    logging_strategy='steps',\n",
+        "    disable_tqdm=False,  # Show progress bar in Colab\n",
+        "    save_strategy='steps',\n",
+        "    save_steps=SAVE_STEPS,\n",
+        "    save_total_limit=2,\n",
+        "    push_to_hub=False,  # We push manually at the end\n",
+        "    log_completions=False,\n",
+        "    report_to='none',\n",
+        "    seed=42,\n",
+        ")\n",
+        "\n",
+        "print('Initializing GRPOTrainer...')\n",
+        "trainer = GRPOTrainer(\n",
+        "    model=MODEL_NAME,\n",
+        "    args=config,\n",
+        "    reward_funcs=[format_reward, severity_reward, category_reward, quality_reward],\n",
+        "    train_dataset=train_dataset,\n",
+        ")\n",
+        "print(f'✅ GRPOTrainer ready! {len(train_dataset)} samples, ~{len(train_dataset) // (BATCH_SIZE * GRAD_ACCUM)} steps')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 8: TRAIN! 🚀\n",
+        "# This takes 4-6 hours on T4. Colab will keep running if you stay connected.\n",
+        "# Tip: Keep the tab open and active to prevent disconnection.\n",
+        "\n",
+        "import time\n",
+        "start = time.time()\n",
+        "print('🚀 Starting GRPO V2 training...')\n",
+        "print(f'Estimated time: ~{len(train_dataset) / (BATCH_SIZE * GRAD_ACCUM) * 45 / 3600:.1f} hours')\n",
+        "print()\n",
+        "\n",
+        "trainer.train()\n",
+        "\n",
+        "elapsed = time.time() - start\n",
+        "print(f'\\n✅ Training complete in {elapsed/3600:.1f} hours!')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 9: Save and push to Hub\n",
+        "import os\n",
+        "from huggingface_hub import HfApi\n",
+        "\n",
+        "print(f'Saving model to {OUTPUT_DIR}...')\n",
+        "trainer.save_model(OUTPUT_DIR)\n",
+        "\n",
+        "print(f'Pushing to Hub: {HUB_MODEL_ID}...')\n",
+        "api = HfApi()\n",
+        "api.create_repo(repo_id=HUB_MODEL_ID, exist_ok=True)\n",
+        "\n",
+        "# Upload model files (skip checkpoints and optimizer states to save time)\n",
+        "api.upload_folder(\n",
+        "    folder_path=OUTPUT_DIR,\n",
+        "    repo_id=HUB_MODEL_ID,\n",
+        "    commit_message='GRPO V2 — trained on real audit findings, 4 reward functions',\n",
+        "    ignore_patterns=['checkpoint-*', '*.pt'],  # Skip checkpoints\n",
+        ")\n",
+        "\n",
+        "print(f'\\n🎉 Model pushed to https://huggingface.co/{HUB_MODEL_ID}')"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Cell 10: Quick inference test\n",
+        "from transformers import pipeline as hf_pipeline\n",
+        "\n",
+        "print('Loading trained model for inference...')\n",
+        "pipe = hf_pipeline('text-generation', model=OUTPUT_DIR, device=0, torch_dtype=torch.bfloat16)\n",
+        "\n",
+        "test_contract = \"\"\"\n",
+        "pragma solidity ^0.8.0;\n",
+        "\n",
+        "contract SimpleBank {\n",
+        "    mapping(address => uint256) public balances;\n",
+        "\n",
+        "    function deposit() public payable {\n",
+        "        balances[msg.sender] += msg.value;\n",
+        "    }\n",
+        "\n",
+        "    function withdraw(uint256 amount) public {\n",
+        "        require(balances[msg.sender] >= amount);\n",
+        "        (bool success, ) = msg.sender.call{value: amount}(\\\"\\\");\n",
+        "        require(success);\n",
+        "        balances[msg.sender] -= amount;\n",
+        "    }\n",
+        "}\n",
+        "\"\"\"\n",
+        "\n",
+        "messages = [\n",
+        "    {'role': 'system', 'content': 'You are an expert smart contract security auditor. Analyze the provided Solidity code for vulnerabilities.'},\n",
+        "    {'role': 'user', 'content': f'Audit this contract:\\n```solidity\\n{test_contract}\\n```'},\n",
+        "]\n",
+        "\n",
+        "result = pipe(messages, max_new_tokens=512, do_sample=False, return_full_text=False)\n",
+        "output = result[0]['generated_text']\n",
+        "if isinstance(output, list):\n",
+        "    output = output[-1]['content']\n",
+        "\n",
+        "print('\\n=== Audit Result ===')\n",
+        "print(output)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "---\n",
+        "\n",
+        "## 🎉 Done!\n",
+        "\n",
+        "Your V2 model is now pushed to the Hub. Test it interactively at:\n",
+        "\n",
+        "**Demo Space:** [oxdev/security-auditor-demo](https://huggingface.co/spaces/oxdev/security-auditor-demo)\n",
+        "\n",
+        "**Model:** [oxdev/security-auditor-grpo](https://huggingface.co/oxdev/security-auditor-grpo)\n",
+        "\n",
+        "### Next Steps\n",
+        "- Train on more data: increase `SUBSET_SIZE` to 5000 or 10000\n",
+        "- Use a bigger model: try `Qwen/Qwen2.5-Coder-1.5B-Instruct` (needs A100)\n",
+        "- Fine-tune rewards: adjust weights in `reward_weights`\n",
+        "- Try different hyperparameters: learning rate, beta, num_generations"
+      ]
+    }
+  ]
+}