Training LLMs to Hack and Defend: Introducing Cannon & Wall

Large Language Models possess extensive security knowledge, yet they consistently fail at adversarial reasoning. They can recite definitions of SQL injection, but struggle to proactively hunt for it or reliably patch it in dynamic scenarios. The reason is simple: current benchmarks test static memorization, not adversarial loops.

To fix this, we built Cannon & Wall—a self-improving reinforcement learning environment where LLMs learn cybersecurity through continuous self-play.

How It Works

Inside a sandboxed Flask application, two LLM agents face off over seeded vulnerabilities (like SQLi, XSS, and Broken Authentication):

  • 🔴 Cannon (Attacker): Analyzes source code to find vulnerabilities and generate proof-of-concept textual exploits.
  • 🔵 Wall (Defender): Reads Cannon’s reports and rewrites the source code to harden the application.

A deterministic Judge oversees the match. If Wall patches the code, Cannon attempts a bypass. Every success and failure translates directly into structured reward signals. By constraining the environment to textual reasoning over source code—rather than executing live, unpredictable exploits—the arena remains safe, scalable, and highly effective for RL algorithms like GRPO.

The Logic of Self-Play

Instead of training on static datasets, these agents co-evolve. Cannon sharpens its attack vectors when it fails to find bugs; Wall builds more resilient logic when it gets exploited. Strict anti-reward-hacking measures, including deterministic verification and functional preservation checks, ensure the models learn actual security protocols rather than gaming the system.

The Objective

Security verification is an entirely objective domain, making it the perfect candidate for adversarial RL. Developed at the Meta PyTorch × OpenEnv Hackathon in Bangalore this weekend, Cannon & Wall provides the structured training ground necessary to turn LLMs from passive encyclopedias into active security reasoning engines.

Stop relying on static benchmarks to measure dynamic reasoning.

Test the live Cannon & Wall environment on HuggingFace Spaces right now.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support