Instructions to use Chunjiang-Intelligence/DeepSeek-v4-Fable with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Chunjiang-Intelligence/DeepSeek-v4-Fable")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable") model = AutoModelForCausalLM.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Chunjiang-Intelligence/DeepSeek-v4-Fable" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chunjiang-Intelligence/DeepSeek-v4-Fable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Chunjiang-Intelligence/DeepSeek-v4-Fable
- SGLang
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Chunjiang-Intelligence/DeepSeek-v4-Fable" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chunjiang-Intelligence/DeepSeek-v4-Fable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Chunjiang-Intelligence/DeepSeek-v4-Fable" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chunjiang-Intelligence/DeepSeek-v4-Fable", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with Docker Model Runner:
docker model run hf.co/Chunjiang-Intelligence/DeepSeek-v4-Fable
Intended Use · Quickstart · Training · Evaluation · Acceptable Use · ESG
DeepSeek-V4-Fable is an autonomous agent engineered for offensive security research. Use of this model to access, scan, or exploit systems without explicit, documented authorization is strictly prohibited. Users must comply with the Acceptable Use Policy outlined below.
Introduction
DeepSeek-V4-Fable is a distilled variant of Claude-5-Fable, built on top of DeepSeek-V4-Flash and adapted for autonomous security research workflows. It is designed for structured, tool-oriented tasks such as challenge solving, exploitation planning, and multi-step reasoning in controlled environments.
As a distilled Fable model, it preserves the core task orientation of the original system while providing a more practical format for research and deployment. The model is intended for authorized security evaluation, CTF problem solving, and research on long-horizon agent behavior, with training focused on procedural reliability in sandboxed settings rather than broad conversational coverage.
DeepSeek-V4-Fable should be understood as a domain-specific system rather than a general-purpose assistant. Because it is capable of generating offensive security actions, access and deployment should be limited to authorized, supervised environments with clear operational boundaries.
Intended Use & Out-of-Scope Use
Intended Use DeepSeek-V4-Fable is developed exclusively for defensive security research, authorized penetration testing, and red-team engagements within strictly defined scopes. It serves as a specialized tool for vulnerability research on owned or authorized systems, as well as a benchmark for evaluating autonomous-agent safety and capabilities in controlled environments, such as CTF competitions and educational security labs.
Prohibited Use The model must not be deployed to access, scan, or exploit any system without explicit authorization. Prohibited activities include mass or indiscriminate targeting, opportunistic exploitation, and the development or deployment of malware, ransomware, or destructive payloads. Furthermore, it is strictly forbidden to use the model for supply-chain compromise, establishing persistent backdoors, evading security controls, or any actions that violate applicable laws (e.g., CFAA, Computer Misuse Act). DeepSeek-V4-Fable is not a general-purpose assistant and is explicitly optimized for procedural security tasks; it should not be relied upon for general NLP applications.
How to Use
from encoding_dsv4 import encode_messages, parse_message_from_completion_text
import transformers
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
{"role": "user", "content": "1+1=?"}
]
# Format messages into the model's required string template
prompt = encode_messages(messages, thinking_mode="thinking")
# Tokenize the prompt
tokenizer = transformers.AutoTokenizer.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable")
tokens = tokenizer.encode(prompt)
Training Details
Training Data The model was fine-tuned on SecDojo-80K, a proprietary corpus comprising 80,000 verified Capture The Flag (CTF) trajectories. These trajectories were synthesized by guiding a teacher model through publicly archived challenges within an instrumented sandbox. To ensure data quality, each trajectory was subjected to rigorous filtering, requiring out-of-band verification of submitted flags and the elimination of action loops or non-reproducible successes. The held-out evaluation set was strictly decontaminated by excluding source-challenge identities.
| Category | Challenges | Trajectories | Avg. turns | p95 ctx | Teacher solve |
|---|---|---|---|---|---|
| Web Security | 1,240 | 28,500 | 14.2 | 38.4K | 71.4% |
| Binary Exploitation (Pwn) | 850 | 15,200 | 22.5 | 92.6K | 38.9% |
| Reverse Engineering | 920 | 18,400 | 18.7 | 71.2K | 46.2% |
| Cryptography | 630 | 11,300 | 8.4 | 21.5K | 63.0% |
| Miscellaneous | 410 | 6,600 | 6.1 | 15.0K | 74.8% |
| Total / mean | 4,050 | 80,000 | 15.8 | 61.3K | 56.1% |
Training Procedure The training pipeline consisted of two primary phases. Phase 1 utilized rejection-sampled Supervised Fine-Tuning (SFT) over three epochs, applying token cross-entropy exclusively to assistant reasoning and action spans while masking environment observations. Phase 2 implemented Group Relative Policy Optimization (GRPO), an on-policy reinforcement learning approach against programmatic sandbox rewards. The reward function was shaped to incorporate terminal flag acquisition, dense verifiable milestones (such as service fingerprinting and memory leaks), and strict penalties for malformed actions.
Infrastructure optimizations included a Read-Only Parameter Streaming (ROPS) mechanism, which refined ZeRO-3 CPU offloading. By leveraging the frozen LoRA backbone, this approach enabled unidirectional and statically prefetchable parameter streams, significantly reducing PCIe-bound stall fractions and improving end-to-end step time.
| LoRA & shared | Phase 1 (SFT) | Phase 2 (GRPO) |
|---|---|---|
| rank 64, α 128, dropout 0.05 | epochs 3 | rollouts/step 16×16 |
| targets: q,k,v,o + experts w₁,₂,₃ + router | global batch 512 | group size G=16 |
| bf16 params, fp32 optim states | peak LR 1e-4 | peak LR 5e-6 |
| trainable 0.94B (0.33%) | cosine, 3% warmup, max seq 96K | clip ε=0.2, KL β=0.02, T=1.0, top-p=0.95 |
Evaluation
Evaluated on 300 held-out CTF challenges, decontaminated by source identity against SecDojo-80K. The primary metric is the solve rate, defined as acquiring a verifier-accepted flag within 40 turns.
| Model | Web | Pwn | Rev | Crypto | Overall |
|---|---|---|---|---|---|
| V4-Flash base (0-shot) | 19.4 | 4.1 | 7.8 | 22.6 | 13.5 |
| + SFT (Phase 1) | 41.2 | 18.7 | 24.3 | 47.1 | 31.2 |
| – obs. masking | 37.0 | 15.1 | 20.8 | 43.2 | 26.9 |
| + GRPO (full) | 63.8 | 44.5 | 51.2 | 68.9 | 58.7 |
| – dense milestones | 60.1 | 30.6 | 44.7 | 66.0 | 49.6 |
| mean turns-to-flag ↓ | 11.3 | 19.8 | 16.4 | 7.2 | 13.4 |
Key Findings GRPO delivers its most significant performance gains on exploration-heavy Binary Exploitation (+25.8 points) and Reverse Engineering (+26.9 points) splits relative to SFT. Observation loss-masking contributes an overall improvement of 4.3 points. Furthermore, dense milestone rewards account for a 9.1-point overall increase, concentrated almost entirely in multi-stage challenges where terminal-only credit is insufficient for effective learning. The KL anchor proved critical; ablation studies demonstrated that removing it caused the policy to collapse into degenerate payload-spraying behavior.
These results measure capability within a reconstructed sandbox environment and do not serve as an endorsement for unsupervised real-world deployment.
Bias, Risks, and Limitations
Capability Risks The model possesses the capability to autonomously chain reconnaissance, exploitation, and verification steps. Misuse against unauthorized systems poses a significant risk of real-world harm. Its long-horizon memory allows the agent to retain and reuse reconnaissance artifacts across extended interactions, facilitating complex attack chains. Additionally, the model's capacity for cheap, parallelizable execution may elevate the aggregate risk of otherwise low-severity vulnerabilities.
Technical Limitations DeepSeek-V4-Fable is a domain-specific model and will exhibit degraded performance on general NLP or non-security tasks. The model may generate hallucinated or invalid tool commands; therefore, execution must occur in a secure sandbox rather than trusting predicted observations. Its behavior is heavily dependent on the programmatic verifiers used during training, making its reliability in environments lacking ground-truth signals uncertain. Finally, the training trajectories are derived from public CTF challenges, which may introduce reconstruction bias and limit transferability to novel, real-world target distributions.
Recommendations Deployment should be restricted to isolated, authorized, and audited environments. Strict human oversight and per-action resource limits are required. All model outputs must be treated as hypotheses requiring verification rather than ground truth.
Acceptable Use Policy (AUP)
By accessing or using DeepSeek-V4-Fable, you agree to the following terms:
- You will not access, scan, test, or exploit any system, network, or account without explicit, documented authorization from its owner.
- You will not use the model for mass or indiscriminate targeting, opportunistic exploitation, or self-propagating automation.
- You will not develop, distribute, or deploy malware, ransomware, or destructive payloads against production or third-party systems.
- You will not conduct supply-chain compromise, install persistent backdoors, or build tooling designed primarily to evade detection for malicious purposes.
- You will strictly comply with all applicable laws and regulations, including computer-misuse, privacy, and export statutes.
- You will not remove, disable, or circumvent the model's safety, logging, or scope-enforcement mechanisms.
Chunjiang Intelligence reserves the right to revoke access for any violations. To report misuse or vulnerabilities, please contact our Trust & Safety team.
Environmental Impact & ESG
Chunjiang Intelligence reports the environmental footprint of model development in accordance with our sustainability commitments. The figures below cover the fine-tuning stage (SFT + GRPO); base-model pretraining metrics are available in the DeepSeek-V4-Flash model card.
| Metric | Value |
|---|---|
| Hardware | 64 × NVIDIA H800-80GB |
| Cloud provider | Private cluster |
| Compute region | East Asia |
| Cumulative GPU-hours (SFT + GRPO) | 1,920 GPU-h (30 hours wall-clock time) |
| Power Usage Effectiveness (PUE) | 1.18 |
| Grid carbon intensity (est.) | ≈ 0.55 kgCO₂eq/kWh |
| Estimated emitted CO₂eq | 0.87 tCO₂eq (872 kg) |
| Net reported emissions | 0.87 tCO₂eq |
Carbon Estimation Methodology Estimates follow the methodology of Lacoste et al. (2019) utilizing the ML CO₂ Impact framework, assuming a conservative 700W TDP per H800 GPU. The ROPS efficiency optimization materially reduced our total energy consumption by shortening the required cluster time from an estimated 63 hours to 30 hours.
Sustainability & Governance Commitments We are committed to transparent footprint tracking, establishing a baseline of our compute emissions without relying on greenwashed claims. We are actively integrating with carbon-removal platforms to retrospectively offset 100% of our historical training emissions by Q4 2026. Our research prioritizes efficiency-first alignment, focusing on parameter-efficient and hardware-aware training methods to maximize capability gains per kilowatt-hour. Furthermore, we ensure responsible release through gated access and explicit AUPs, maintain strict data provenance without utilizing personal data, and provide comprehensive transparency through our technical reports and documentation.
Citation
@techreport{chunjiang2026fable,
title = {DeepSeek-V4-Fable: Aligning a 284B Sparse MoE Model for Autonomous Cyber Operations},
author = {{Chunjiang Intelligence, LLM Alignment Team}},
institution = {Chunjiang Intelligence},
year = {2026},
note = {Version 1.1}
}
Contact
If you have any questions, please raise an issue or contact us at hi@chunjiang.dev or imbue2025@outlook.com.
- Downloads last month
- 10
Model tree for Chunjiang-Intelligence/DeepSeek-v4-Fable
Base model
deepseek-ai/DeepSeek-V4-Flash