add newlines and thinking tokens to template to avoid having to compute 3 extra tokens per generation in chat completion+reasoning

#35

annoyed angry miku pointing gen ComfyUI 2025-06-16-15_00011_(1)
This updated template prefills the tokens the model would have generated itself to begin the thinking process.

Behavior with current template:

Thinking enabled prefill: <|turn>model\n
Model then generates: <|channel>thought\n
3 tokens are generated before beginning the thinking process. Wasted compute.

Thinking disabled prefill: <|turn>model\n
Template adds this: <|channel>thought\n<channel|>
No extra tokens generated, fine.

Now with the improved template:

Thinking enabled prefill: <|turn>model\n<|channel>thought\n
No extra tokens generated. Model starts generating the thinking process without first having to generate the 3 extra tokens.

Thinking disabled prefill: <|turn>model\n<|channel>thought\n
Template adds this: <channel|>
No extra tokens generated, same as original template.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment