model unusable

#3
by zaddyzaddy - opened

tried serving using vLLM and Sglang

sglang serve \
  --trust-remote-code \
  --model-path Chunjiang-Intelligence/DeepSeek-v4-Fable \
  --tp 8 \
  --moe-runner-backend flashinfer_mxfp4 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --chunked-prefill-size 4096 \
  --disable-flashinfer-autotune \
  --swa-full-tokens-ratio 0.1 \
  --reasoning-parser deepseek-v4 \
  --tool-call-parser deepseekv4 \
  --host 0.0.0.0 \
  --port 30000

Fails with

[2026-06-24 20:18:39] Unexpected routed-expert safetensors dtype=BF16 for DeepSeek V4
[2026-06-24 20:18:39] Hybrid swa model: self.hf_config.architectures=['DeepseekV4ForCausalLM']
[transformers] Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'attention_factor'}
[2026-06-24 20:18:40] kill_process_tree called: parent_pid=12771, include_parent=False, pid=12771
Traceback (most recent call last):
  File "/usr/local/bin/sglang", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/sgl-workspace/sglang/python/sglang/cli/main.py", line 40, in main
    serve(args, extra_argv)
  File "/sgl-workspace/sglang/python/sglang/cli/serve.py", line 128, in serve
    run_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 50, in run_server
    launch_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 2401, in launch_server
    ) = Engine._launch_subprocesses(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 866, in _launch_subprocesses
    tokenizer_manager, template_manager = init_tokenizer_manager_func(
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 137, in init_tokenizer_manager
    tokenizer_manager = TokenizerManagerClass(server_args, port_args)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 266, in __init__
    self.init_tokenizer_and_processor()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 354, in init_tokenizer_and_processor
    self.tokenizer = get_tokenizer(
                     ^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/utils/hf_transformers/tokenizer.py", line 499, in get_tokenizer
    tokenizer = _auto_tokenizer_from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/utils/hf_transformers/tokenizer.py", line 165, in _auto_tokenizer_from_pretrained
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 837, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1743, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py", line 1933, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_tokenizers.py", line 376, in __init__
    raise ValueError(
ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece or tiktoken installed to convert a slow tokenizer to a fast one.

Sign up or log in to comment