Instructions to use Chunjiang-Intelligence/DeepSeek-v4-Fable with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Chunjiang-Intelligence/DeepSeek-v4-Fable")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable")
model = AutoModelForCausalLM.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Chunjiang-Intelligence/DeepSeek-v4-Fable"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chunjiang-Intelligence/DeepSeek-v4-Fable",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Chunjiang-Intelligence/DeepSeek-v4-Fable

SGLang

How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Chunjiang-Intelligence/DeepSeek-v4-Fable" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chunjiang-Intelligence/DeepSeek-v4-Fable",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Chunjiang-Intelligence/DeepSeek-v4-Fable" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chunjiang-Intelligence/DeepSeek-v4-Fable",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with Docker Model Runner:
```
docker model run hf.co/Chunjiang-Intelligence/DeepSeek-v4-Fable
```

Intended Use · Quickstart · Training · Evaluation · Acceptable Use · ESG

Technical Report👁️

DeepSeek-V4-Fable is an autonomous agent engineered for offensive security research. Use of this model to access, scan, or exploit systems without explicit, documented authorization is strictly prohibited. Users must comply with the Acceptable Use Policy outlined below.

Introduction

DeepSeek-V4-Fable is a distilled variant of Claude-5-Fable, built on top of DeepSeek-V4-Flash and adapted for autonomous security research workflows. It is designed for structured, tool-oriented tasks such as challenge solving, exploitation planning, and multi-step reasoning in controlled environments.

As a distilled Fable model, it preserves the core task orientation of the original system while providing a more practical format for research and deployment. The model is intended for authorized security evaluation, CTF problem solving, and research on long-horizon agent behavior, with training focused on procedural reliability in sandboxed settings rather than broad conversational coverage.

DeepSeek-V4-Fable should be understood as a domain-specific system rather than a general-purpose assistant. Because it is capable of generating offensive security actions, access and deployment should be limited to authorized, supervised environments with clear operational boundaries.

Intended Use & Out-of-Scope Use

Intended Use DeepSeek-V4-Fable is developed exclusively for defensive security research, authorized penetration testing, and red-team engagements within strictly defined scopes. It serves as a specialized tool for vulnerability research on owned or authorized systems, as well as a benchmark for evaluating autonomous-agent safety and capabilities in controlled environments, such as CTF competitions and educational security labs.

Prohibited Use The model must not be deployed to access, scan, or exploit any system without explicit authorization. Prohibited activities include mass or indiscriminate targeting, opportunistic exploitation, and the development or deployment of malware, ransomware, or destructive payloads. Furthermore, it is strictly forbidden to use the model for supply-chain compromise, establishing persistent backdoors, evading security controls, or any actions that violate applicable laws (e.g., CFAA, Computer Misuse Act). DeepSeek-V4-Fable is not a general-purpose assistant and is explicitly optimized for procedural security tasks; it should not be relied upon for general NLP applications.

How to Use

from encoding_dsv4 import encode_messages, parse_message_from_completion_text
import transformers

messages = [
    {"role": "user", "content": "hello"},
    {"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
    {"role": "user", "content": "1+1=?"}
]

# Format messages into the model's required string template
prompt = encode_messages(messages, thinking_mode="thinking")

# Tokenize the prompt
tokenizer = transformers.AutoTokenizer.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable")
tokens = tokenizer.encode(prompt)

Training Details

Training Data The model was fine-tuned on SecDojo-80K, a proprietary corpus comprising 80,000 verified Capture The Flag (CTF) trajectories. These trajectories were synthesized by guiding a teacher model through publicly archived challenges within an instrumented sandbox. To ensure data quality, each trajectory was subjected to rigorous filtering, requiring out-of-band verification of submitted flags and the elimination of action loops or non-reproducible successes. The held-out evaluation set was strictly decontaminated by excluding source-challenge identities.

Category	Challenges	Trajectories	Avg. turns	p95 ctx	Teacher solve
Web Security	1,240	28,500	14.2	38.4K	71.4%
Binary Exploitation (Pwn)	850	15,200	22.5	92.6K	38.9%
Reverse Engineering	920	18,400	18.7	71.2K	46.2%
Cryptography	630	11,300	8.4	21.5K	63.0%
Miscellaneous	410	6,600	6.1	15.0K	74.8%
Total / mean	4,050	80,000	15.8	61.3K	56.1%

Training Procedure The training pipeline consisted of two primary phases. Phase 1 utilized rejection-sampled Supervised Fine-Tuning (SFT) over three epochs, applying token cross-entropy exclusively to assistant reasoning and action spans while masking environment observations. Phase 2 implemented Group Relative Policy Optimization (GRPO), an on-policy reinforcement learning approach against programmatic sandbox rewards. The reward function was shaped to incorporate terminal flag acquisition, dense verifiable milestones (such as service fingerprinting and memory leaks), and strict penalties for malformed actions.

Infrastructure optimizations included a Read-Only Parameter Streaming (ROPS) mechanism, which refined ZeRO-3 CPU offloading. By leveraging the frozen LoRA backbone, this approach enabled unidirectional and statically prefetchable parameter streams, significantly reducing PCIe-bound stall fractions and improving end-to-end step time.

LoRA & shared	Phase 1 (SFT)	Phase 2 (GRPO)
rank 64, α 128, dropout 0.05	epochs 3	rollouts/step 16×16
targets: q,k,v,o + experts w₁,₂,₃ + router	global batch 512	group size G=16
bf16 params, fp32 optim states	peak LR 1e-4	peak LR 5e-6
trainable 0.94B (0.33%)	cosine, 3% warmup, max seq 96K	clip ε=0.2, KL β=0.02, T=1.0, top-p=0.95

Evaluation

Evaluated on 300 held-out CTF challenges, decontaminated by source identity against SecDojo-80K. The primary metric is the solve rate, defined as acquiring a verifier-accepted flag within 40 turns.

Model	Web	Pwn	Rev	Crypto	Overall
V4-Flash base (0-shot)	19.4	4.1	7.8	22.6	13.5
+ SFT (Phase 1)	41.2	18.7	24.3	47.1	31.2
– obs. masking	37.0	15.1	20.8	43.2	26.9
+ GRPO (full)	63.8	44.5	51.2	68.9	58.7
– dense milestones	60.1	30.6	44.7	66.0	49.6
mean turns-to-flag ↓	11.3	19.8	16.4	7.2	13.4

Key Findings GRPO delivers its most significant performance gains on exploration-heavy Binary Exploitation (+25.8 points) and Reverse Engineering (+26.9 points) splits relative to SFT. Observation loss-masking contributes an overall improvement of 4.3 points. Furthermore, dense milestone rewards account for a 9.1-point overall increase, concentrated almost entirely in multi-stage challenges where terminal-only credit is insufficient for effective learning. The KL anchor proved critical; ablation studies demonstrated that removing it caused the policy to collapse into degenerate payload-spraying behavior.

These results measure capability within a reconstructed sandbox environment and do not serve as an endorsement for unsupervised real-world deployment.

Bias, Risks, and Limitations

Capability Risks The model possesses the capability to autonomously chain reconnaissance, exploitation, and verification steps. Misuse against unauthorized systems poses a significant risk of real-world harm. Its long-horizon memory allows the agent to retain and reuse reconnaissance artifacts across extended interactions, facilitating complex attack chains. Additionally, the model's capacity for cheap, parallelizable execution may elevate the aggregate risk of otherwise low-severity vulnerabilities.

Technical Limitations DeepSeek-V4-Fable is a domain-specific model and will exhibit degraded performance on general NLP or non-security tasks. The model may generate hallucinated or invalid tool commands; therefore, execution must occur in a secure sandbox rather than trusting predicted observations. Its behavior is heavily dependent on the programmatic verifiers used during training, making its reliability in environments lacking ground-truth signals uncertain. Finally, the training trajectories are derived from public CTF challenges, which may introduce reconstruction bias and limit transferability to novel, real-world target distributions.

Recommendations Deployment should be restricted to isolated, authorized, and audited environments. Strict human oversight and per-action resource limits are required. All model outputs must be treated as hypotheses requiring verification rather than ground truth.

Acceptable Use Policy (AUP)

By accessing or using DeepSeek-V4-Fable, you agree to the following terms:

You will not access, scan, test, or exploit any system, network, or account without explicit, documented authorization from its owner.
You will not use the model for mass or indiscriminate targeting, opportunistic exploitation, or self-propagating automation.
You will not develop, distribute, or deploy malware, ransomware, or destructive payloads against production or third-party systems.
You will not conduct supply-chain compromise, install persistent backdoors, or build tooling designed primarily to evade detection for malicious purposes.
You will strictly comply with all applicable laws and regulations, including computer-misuse, privacy, and export statutes.
You will not remove, disable, or circumvent the model's safety, logging, or scope-enforcement mechanisms.

Chunjiang Intelligence reserves the right to revoke access for any violations. To report misuse or vulnerabilities, please contact our Trust & Safety team.

Environmental Impact & ESG

Chunjiang Intelligence reports the environmental footprint of model development in accordance with our sustainability commitments. The figures below cover the fine-tuning stage (SFT + GRPO); base-model pretraining metrics are available in the DeepSeek-V4-Flash model card.

Metric	Value
Hardware	64 × NVIDIA H800-80GB
Cloud provider	Private cluster
Compute region	East Asia
Cumulative GPU-hours (SFT + GRPO)	1,920 GPU-h (30 hours wall-clock time)
Power Usage Effectiveness (PUE)	1.18
Grid carbon intensity (est.)	≈ 0.55 kgCO₂eq/kWh
Estimated emitted CO₂eq	0.87 tCO₂eq (872 kg)
Net reported emissions	0.87 tCO₂eq

Carbon Estimation Methodology Estimates follow the methodology of Lacoste et al. (2019) utilizing the ML CO₂ Impact framework, assuming a conservative 700W TDP per H800 GPU. The ROPS efficiency optimization materially reduced our total energy consumption by shortening the required cluster time from an estimated 63 hours to 30 hours.

Sustainability & Governance Commitments We are committed to transparent footprint tracking, establishing a baseline of our compute emissions without relying on greenwashed claims. We are actively integrating with carbon-removal platforms to retrospectively offset 100% of our historical training emissions by Q4 2026. Our research prioritizes efficiency-first alignment, focusing on parameter-efficient and hardware-aware training methods to maximize capability gains per kilowatt-hour. Furthermore, we ensure responsible release through gated access and explicit AUPs, maintain strict data provenance without utilizing personal data, and provide comprehensive transparency through our technical reports and documentation.

Citation

@techreport{chunjiang2026fable,
  title        = {DeepSeek-V4-Fable: Aligning a 284B Sparse MoE Model for Autonomous Cyber Operations},
  author       = {{Chunjiang Intelligence, LLM Alignment Team}},
  institution  = {Chunjiang Intelligence},
  year         = {2026},
  note         = {Version 1.1}
}

Contact

If you have any questions, please raise an issue or contact us at hi@chunjiang.dev or imbue2025@outlook.com.

Downloads last month: 10

Model tree for Chunjiang-Intelligence/DeepSeek-v4-Fable

Base model

deepseek-ai/DeepSeek-V4-Flash

Adapter

(1)

this model

Chunjiang-Intelligence
/

DeepSeek-v4-Fable