Instructions to use cbrooklyn/Talon-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use cbrooklyn/Talon-Preview with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="cbrooklyn/Talon-Preview", filename="gguf/talon-preview-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use cbrooklyn/Talon-Preview with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M # Run inference directly in the terminal: llama-cli -hf cbrooklyn/Talon-Preview:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M # Run inference directly in the terminal: llama-cli -hf cbrooklyn/Talon-Preview:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf cbrooklyn/Talon-Preview:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf cbrooklyn/Talon-Preview:Q4_K_M
Use Docker
docker model run hf.co/cbrooklyn/Talon-Preview:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use cbrooklyn/Talon-Preview with Ollama:
ollama run hf.co/cbrooklyn/Talon-Preview:Q4_K_M
- Unsloth Studio
How to use cbrooklyn/Talon-Preview with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for cbrooklyn/Talon-Preview to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for cbrooklyn/Talon-Preview to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for cbrooklyn/Talon-Preview to start chatting
- Pi
How to use cbrooklyn/Talon-Preview with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "cbrooklyn/Talon-Preview:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use cbrooklyn/Talon-Preview with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default cbrooklyn/Talon-Preview:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use cbrooklyn/Talon-Preview with Docker Model Runner:
docker model run hf.co/cbrooklyn/Talon-Preview:Q4_K_M
- Lemonade
How to use cbrooklyn/Talon-Preview with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull cbrooklyn/Talon-Preview:Q4_K_M
Run and chat with the model
lemonade run user.Talon-Preview-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = "No input example has been defined for this model task."
)Read This First
This is not a hacking assistant. It will not one-shot OSCP machines or automatically pop shells on VulnLab boxes. If that is what you are looking for, this is not it , and anything in this weight class that claims otherwise is lying to you.
Talon-Preview is a demo release. It exists to demonstrate a training pipeline and give a first look at the direction the Talon project is heading. It is not an alpha. It is not a beta. It is a proof of concept , a deliberate early release so that the development process is visible and the project is accountable to a real audience from the start.
Expect rough edges. Set expectations accordingly.
What Is Talon-Preview?
Talon-Preview is an early-access security reasoning assistant fine-tuned for authorized penetration testing methodology, red team planning, and offensive security education. It is built on a 5.1B parameter base model with 2B parameters active during inference.
At this stage, Talon-Preview is strongest at:
- Structuring penetration testing methodology and engagement plans
- Explaining how attacks work at a conceptual and mechanical level
- Walking through enumeration phases in a structured way
- Explaining the legal and authorization framework for security work
- Reasoning about attack paths and helping operators think through options
- Security report writing and executive summary drafting
It is weakest at:
- Generating precise CLI syntax for tools , hallucinations on specific flags, arguments, and tool interfaces are a known and significant issue at this model size
- Complex multi-step reasoning chains that require holding a lot of technical state simultaneously
- Sophisticated low-level prompts involving binary exploitation, shellcode mechanics, or kernel internals , these will frequently produce plausible-sounding/looking but incorrect output
The rule of thumb: use Talon-Preview to think and plan. Do not use it to generate commands you intend to run without verifying them manually first.
Known Limitations — Do Not Skip This Section
This section is not a disclaimer. It is operational guidance.
Hallucinations on tool syntax are frequent and confident.
When asked about specific CLI tools , their flags, subcommands, and
behavior , Talon-Preview will sometimes output commands that look correct
but are not (Hallucinations). This is a known consequence of the model size (Talon-Preview only has a total of 5.1 Billion Parameters) and training
dataset at this stage. Always verify commands against official documentation
or man pages before running them in an engagement.
Avoid highly sophisticated or extremely low-level prompts. Prompts that require precise kernel internals, exact exploit mechanics, or complex multi-stage reasoning chains will push the model past its reliable capability boundary. The output may be coherent but technically wrong in ways that are not immediately obvious.
This model cannot replace human judgment in a live engagement. Talon-Preview is a reasoning aid, not an autonomous operator. Treat its output the way you would treat a junior analyst's first draft — useful starting point, requires review.
2B active parameters is a real constraint. Long or complex conversations may see quality degrade as the context grows. If reasoning quality drops, start a fresh session with a focused prompt.
Intended Use
Talon-Preview is built for:
- Authorized penetration testing with documented scope and rules of engagement
- CTF competitions and training lab environments
- Security research and education in authorized contexts
- Engagement planning : structuring methodology, scoping, and approach
- Security report writing and finding documentation
Using this model against systems you do not have explicit written authorization to test is outside its intended use and is your legal responsibility, not this project's.
Quickstart — Ollama
# Run directly from HuggingFace via Ollama
ollama run hf.co/cbrooklyn/talon-preview
# Recommended quantization for most hardware
# Q4_K_M is the default , good balance of quality and speed
ollama run hf.co/cbrooklyn/talon-preview:Q4_K_M
# Higher quality if your machine can handle it
ollama run hf.co/cbrooklyn/talon-preview:Q8_0
Example interaction that plays to the model's actual strengths:
User:
I have a target with SMB open, a readable public share,
and two usernames recovered from null session enumeration.
How should I structure the next phase of the assessment?
Talon-Preview:
[Plans the enumeration methodology, explains what to look for
in the share, structures the credential attack approach, and
explains the detection surface]
Model Details
| Attribute | Value |
|---|---|
| Release type | Demo / Proof of Concept |
| Total parameters | 5.1B |
| Active parameters during inference | 2B |
| Inference compatibility | Ollama |
| Recommended quantization | Q4_K_M |
| Available quantizations | Q4_K_M · Q6_K · Q8_0 |
| Language | English |
| Domain | Cybersecurity / Offensive Security |
| Context window | 128,000 tokens |
How Talon Handles Uncertainty
When Talon-Preview hits the edge of its knowledge on tool syntax or CVE specifics, it is trained to invoke its web search tool rather than guess. In practice at this model size this behavior is not perfectly reliable — another known limitation of the preview stage. When you see it search before answering, that is the intended behavior. When you see it output commands confidently without searching, verify those commands before trusting them.
What Is Coming Next
Talon-Preview validates that the training pipeline works end-to-end. The next phase involves a significantly larger base model, a rebuilt training dataset that prioritizes verified tool usage over conceptual descriptions, and an RL environment where Talon can execute commands in an isolated sandbox and receive corrective signal when syntax is wrong or a tool does not exist. That feedback loop is how the hallucination problem gets solved at its root — not by telling the model to be careful, but by giving it an environment where being wrong has a cost.
The preview is the first step. It is not the destination.
- Downloads last month
- 208
4-bit
6-bit
8-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="cbrooklyn/Talon-Preview", filename="", )