Instructions to use cbrooklyn/Talon-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cbrooklyn/Talon-Preview with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="cbrooklyn/Talon-Preview",
	filename="gguf/talon-preview-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use cbrooklyn/Talon-Preview with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf cbrooklyn/Talon-Preview:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf cbrooklyn/Talon-Preview:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf cbrooklyn/Talon-Preview:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf cbrooklyn/Talon-Preview:Q4_K_M

Use Docker

docker model run hf.co/cbrooklyn/Talon-Preview:Q4_K_M

LM Studio
Jan
Ollama
How to use cbrooklyn/Talon-Preview with Ollama:
```
ollama run hf.co/cbrooklyn/Talon-Preview:Q4_K_M
```

Unsloth Studio

How to use cbrooklyn/Talon-Preview with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for cbrooklyn/Talon-Preview to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for cbrooklyn/Talon-Preview to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for cbrooklyn/Talon-Preview to start chatting

How to use cbrooklyn/Talon-Preview with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "cbrooklyn/Talon-Preview:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use cbrooklyn/Talon-Preview with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf cbrooklyn/Talon-Preview:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default cbrooklyn/Talon-Preview:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use cbrooklyn/Talon-Preview with Docker Model Runner:
```
docker model run hf.co/cbrooklyn/Talon-Preview:Q4_K_M
```

Lemonade

How to use cbrooklyn/Talon-Preview with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull cbrooklyn/Talon-Preview:Q4_K_M

Run and chat with the model

lemonade run user.Talon-Preview-Q4_K_M

List all available models

lemonade list

Talon-Preview

A sneak peek at what's being built — not what it will become.

Read This First

This is not a hacking assistant. It will not one-shot OSCP machines or automatically pop shells on VulnLab boxes. If that is what you are looking for, this is not it , and anything in this weight class that claims otherwise is lying to you.

Talon-Preview is a demo release. It exists to demonstrate a training pipeline and give a first look at the direction the Talon project is heading. It is not an alpha. It is not a beta. It is a proof of concept , a deliberate early release so that the development process is visible and the project is accountable to a real audience from the start.

Expect rough edges. Set expectations accordingly.

What Is Talon-Preview?

Talon-Preview is an early-access security reasoning assistant fine-tuned for authorized penetration testing methodology, red team planning, and offensive security education. It is built on a 5.1B parameter base model with 2B parameters active during inference.

At this stage, Talon-Preview is strongest at:

Structuring penetration testing methodology and engagement plans
Explaining how attacks work at a conceptual and mechanical level
Walking through enumeration phases in a structured way
Explaining the legal and authorization framework for security work
Reasoning about attack paths and helping operators think through options
Security report writing and executive summary drafting

It is weakest at:

Generating precise CLI syntax for tools , hallucinations on specific flags, arguments, and tool interfaces are a known and significant issue at this model size
Complex multi-step reasoning chains that require holding a lot of technical state simultaneously
Sophisticated low-level prompts involving binary exploitation, shellcode mechanics, or kernel internals , these will frequently produce plausible-sounding/looking but incorrect output

The rule of thumb: use Talon-Preview to think and plan. Do not use it to generate commands you intend to run without verifying them manually first.

Known Limitations — Do Not Skip This Section

This section is not a disclaimer. It is operational guidance.

Hallucinations on tool syntax are frequent and confident. When asked about specific CLI tools , their flags, subcommands, and behavior , Talon-Preview will sometimes output commands that look correct but are not (Hallucinations). This is a known consequence of the model size (Talon-Preview only has a total of 5.1 Billion Parameters) and training dataset at this stage. Always verify commands against official documentation or man pages before running them in an engagement.

Avoid highly sophisticated or extremely low-level prompts. Prompts that require precise kernel internals, exact exploit mechanics, or complex multi-stage reasoning chains will push the model past its reliable capability boundary. The output may be coherent but technically wrong in ways that are not immediately obvious.

This model cannot replace human judgment in a live engagement. Talon-Preview is a reasoning aid, not an autonomous operator. Treat its output the way you would treat a junior analyst's first draft — useful starting point, requires review.

2B active parameters is a real constraint. Long or complex conversations may see quality degrade as the context grows. If reasoning quality drops, start a fresh session with a focused prompt.

Intended Use

Talon-Preview is built for:

Authorized penetration testing with documented scope and rules of engagement
CTF competitions and training lab environments
Security research and education in authorized contexts
Engagement planning : structuring methodology, scoping, and approach
Security report writing and finding documentation

Using this model against systems you do not have explicit written authorization to test is outside its intended use and is your legal responsibility, not this project's.

Quickstart — Ollama

# Run directly from HuggingFace via Ollama
ollama run hf.co/cbrooklyn/talon-preview

# Recommended quantization for most hardware
# Q4_K_M is the default , good balance of quality and speed
ollama run hf.co/cbrooklyn/talon-preview:Q4_K_M

# Higher quality if your machine can handle it
ollama run hf.co/cbrooklyn/talon-preview:Q8_0

Example interaction that plays to the model's actual strengths:

User:
I have a target with SMB open, a readable public share,
and two usernames recovered from null session enumeration.
How should I structure the next phase of the assessment?

Talon-Preview:
[Plans the enumeration methodology, explains what to look for
in the share, structures the credential attack approach, and
explains the detection surface]

Model Details

Attribute	Value
Release type	Demo / Proof of Concept
Total parameters	5.1B
Active parameters during inference	2B
Inference compatibility	Ollama
Recommended quantization	Q4_K_M
Available quantizations	Q4_K_M · Q6_K · Q8_0
Language	English
Domain	Cybersecurity / Offensive Security
Context window	128,000 tokens

How Talon Handles Uncertainty

When Talon-Preview hits the edge of its knowledge on tool syntax or CVE specifics, it is trained to invoke its web search tool rather than guess. In practice at this model size this behavior is not perfectly reliable — another known limitation of the preview stage. When you see it search before answering, that is the intended behavior. When you see it output commands confidently without searching, verify those commands before trusting them.

What Is Coming Next

Talon-Preview validates that the training pipeline works end-to-end. The next phase involves a significantly larger base model, a rebuilt training dataset that prioritizes verified tool usage over conceptual descriptions, and an RL environment where Talon can execute commands in an isolated sandbox and receive corrective signal when syntax is wrong or a tool does not exist. That feedback loop is how the hallucination problem gets solved at its root — not by telling the model to be careful, but by giving it an environment where being wrong has a cost.

The preview is the first step. It is not the destination.

_{Talon is an independent project building specialized AI security
tooling designed for practitioners who need a thinking partner, not a
chatbot with a security skin.}

Downloads last month: 208

GGUF

Model size

5B params

Architecture

gemma4

Hardware compatibility

4-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support