`olmo3` reasoning parser crashes at startup on Domyn-Small-v1.0 tokenizer

#1
by alescire94 - opened

Hi! running into a startup crash when executing the model card's command using the olmo3 reasoning parser. Details below.

vLLM version: 0.21.0
Command:

uv run vllm serve domyn/Domyn-Small-v1.0 \
    --tensor-parallel-size 1 \
    --dtype bfloat16 \
    --max-model-len 32768 \
    --max-num-seqs 256 \
    --reasoning-parser olmo3

Error:

File ".../vllm/reasoning/olmo3_reasoning_parser.py", line 242, in __init__
    self.vocab[token] for token in self.think_end_first_split
KeyError: 'Ġ</'

Repro:

  1. uv add vllm==0.21.0
  2. Run the command above.
  3. Server crashes at startup with the traceback shown.

Hi, thanks for the report, we were able to reproduce it on our side.

It's a parser/tokenizer mismatch in vLLM 0.21's Olmo3ReasoningParser: its init does an eager lookup of GPT-2-BPE token strings ('Ġ</' etc.) in the vocab, but Domyn-Small uses a SentencePiece tokenizer where / aren't single vocab tokens — so it dies at startup before serving any request.

vLLM 0.20 didn't have this eager check, which is why it worked there.

Two quick options while we sort it out:

Pin to vLLM 0.20.0 — known good, no other change needed.
Or wait a couple of days — we'll ship a small reasoning-parser plugin (loadable via --reasoning-parser-plugin) along with usage instructions in the model card.

Will follow up here once it's published.

Hi @alescire94 , we've just pushed the custom reasoning parser plugin.
You can find instruction on how to use it in the README.

Thank you again for flagging the issue.

Sign up or log in to comment