Omitted <think> at the start and almost 10k tokens to debug 2 JS functions

by operationdarkside - opened 6 days ago

6 days ago

Hello,
I recently downloaded this model as a Q4_K_M GGUF in LMStudio and tested it on a little error finding problem in 2 JS functions.
As mentioned in the title, the model took almost 10k tokens to think and forgot the "" at the start.
2 questions:

Is this amount of thinking tokens intended?
Did the model also omit the "" for anyone else?

Thank you for your work!

urtuuuu

6 days ago

Same, i just deleted it for this reason.

Biggbran

6 days ago

•

edited 6 days ago

I said "Hi" and it's been responding for like 5 minutes because I forgot the <think>!

lewtun

Open R1 org 2 days ago

Hi everyone, we explicitly prefill the chat template with <think> to ensure it always generates the long chain-of-thought: https://huggingface.co/open-r1/OlympicCoder-7B/blob/a093c195a14f190b8228b12cf6cd180c21bfbeec/tokenizer_config.json#L198

In our experience, using sampling with temperature=0.6 and top_p=0.95 gives pretty coherent answers that don't get stuck in the infinite loop!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment