High probability for <|start_header_id|> after a <|start_header_id|>

#53
by espadrine - opened

I notice that the Llama 3.2 1B Instruct has a very high probability for <|start_header_id|> after a <|start_header_id|>, and a near-zero probability for the classic system, user, and assistant tokens that should be used instead, per the template. See the probabilities below:

<|begin_of_text|>: 8.225440979003906e-06, preferred: Tags 0.0029659271240234375
<|start_header_id|>: 8.344650268554688e-07, preferred: Tags 0.0029659271240234375
system: 0.0, preferred: <|start_header_id|> 0.9970703125
<|end_header_id|>: 3.2782554626464844e-06, preferred:
 0.53076171875

Is that a mistake? Or is there a good reason for those probabilities? Wasn’t it trained with the system tokens?

Sign up or log in to comment