Instructions to use Chunjiang-Intelligence/DeepSeek-v4-Fable with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Chunjiang-Intelligence/DeepSeek-v4-Fable")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable")
model = AutoModelForCausalLM.from_pretrained("Chunjiang-Intelligence/DeepSeek-v4-Fable")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Chunjiang-Intelligence/DeepSeek-v4-Fable"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chunjiang-Intelligence/DeepSeek-v4-Fable",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Chunjiang-Intelligence/DeepSeek-v4-Fable

SGLang

How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Chunjiang-Intelligence/DeepSeek-v4-Fable" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chunjiang-Intelligence/DeepSeek-v4-Fable",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Chunjiang-Intelligence/DeepSeek-v4-Fable" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chunjiang-Intelligence/DeepSeek-v4-Fable",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Chunjiang-Intelligence/DeepSeek-v4-Fable with Docker Model Runner:
```
docker model run hf.co/Chunjiang-Intelligence/DeepSeek-v4-Fable
```

model unusable

by zaddyzaddy - opened 11 days ago

Discussion

zaddyzaddy

11 days ago

tried serving using vLLM and Sglang

sglang serve \
  --trust-remote-code \
  --model-path Chunjiang-Intelligence/DeepSeek-v4-Fable \
  --tp 8 \
  --moe-runner-backend flashinfer_mxfp4 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --chunked-prefill-size 4096 \
  --disable-flashinfer-autotune \
  --swa-full-tokens-ratio 0.1 \
  --reasoning-parser deepseek-v4 \
  --tool-call-parser deepseekv4 \
  --host 0.0.0.0 \
  --port 30000

Fails with

[2026-06-24 20:18:39] Unexpected routed-expert safetensors dtype=BF16 for DeepSeek V4
[2026-06-24 20:18:39] Hybrid swa model: self.hf_config.architectures=['DeepseekV4ForCausalLM']
[transformers] Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'attention_factor'}
[2026-06-24 20:18:40] kill_process_tree called: parent_pid=12771, include_parent=False, pid=12771
Traceback (most recent call last):
  File "/usr/local/bin/sglang", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/sgl-workspace/sglang/python/sglang/cli/main.py", line 40, in main
    serve(args, extra_argv)
  File "/sgl-workspace/sglang/python/sglang/cli/serve.py", line 128, in serve
    run_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 50, in run_server
    launch_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 2401, in launch_server
    ) = Engine._launch_subprocesses(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 866, in _launch_subprocesses
    tokenizer_manager, template_manager = init_tokenizer_manager_func(
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 137, in init_tokenizer_manager
    tokenizer_manager = TokenizerManagerClass(server_args, port_args)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 266, in __init__
    self.init_tokenizer_and_processor()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 354, in init_tokenizer_and_processor
    self.tokenizer = get_tokenizer(
                     ^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/utils/hf_transformers/tokenizer.py", line 499, in get_tokenizer
    tokenizer = _auto_tokenizer_from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/utils/hf_transformers/tokenizer.py", line 165, in _auto_tokenizer_from_pretrained
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 837, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1743, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1933, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_tokenizers.py", line 376, in __init__
    raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece or tiktoken installed to convert a slow tokenizer to a fast one.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment