Instructions to use CystronCode/cannon-and-wall-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use CystronCode/cannon-and-wall-grpo with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for CystronCode/cannon-and-wall-grpo to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for CystronCode/cannon-and-wall-grpo to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for CystronCode/cannon-and-wall-grpo to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="CystronCode/cannon-and-wall-grpo", max_seq_length=2048, )
Training LLMs to Hack and Defend: Introducing Cannon & Wall
Large Language Models possess extensive security knowledge, yet they consistently fail at adversarial reasoning. They can recite definitions of SQL injection, but struggle to proactively hunt for it or reliably patch it in dynamic scenarios. The reason is simple: current benchmarks test static memorization, not adversarial loops.
To fix this, we built Cannon & Wall—a self-improving reinforcement learning environment where LLMs learn cybersecurity through continuous self-play.
How It Works
Inside a sandboxed Flask application, two LLM agents face off over seeded vulnerabilities (like SQLi, XSS, and Broken Authentication):
- 🔴 Cannon (Attacker): Analyzes source code to find vulnerabilities and generate proof-of-concept textual exploits.
- 🔵 Wall (Defender): Reads Cannon’s reports and rewrites the source code to harden the application.
A deterministic Judge oversees the match. If Wall patches the code, Cannon attempts a bypass. Every success and failure translates directly into structured reward signals. By constraining the environment to textual reasoning over source code—rather than executing live, unpredictable exploits—the arena remains safe, scalable, and highly effective for RL algorithms like GRPO.
The Logic of Self-Play
Instead of training on static datasets, these agents co-evolve. Cannon sharpens its attack vectors when it fails to find bugs; Wall builds more resilient logic when it gets exploited. Strict anti-reward-hacking measures, including deterministic verification and functional preservation checks, ensure the models learn actual security protocols rather than gaming the system.
The Objective
Security verification is an entirely objective domain, making it the perfect candidate for adversarial RL. Developed at the Meta PyTorch × OpenEnv Hackathon in Bangalore this weekend, Cannon & Wall provides the structured training ground necessary to turn LLMs from passive encyclopedias into active security reasoning engines.
Stop relying on static benchmarks to measure dynamic reasoning.
Test the live Cannon & Wall environment on HuggingFace Spaces right now.