High probability for <|start_header_id|> after a <|start_header_id|>
#53
by
espadrine
- opened
I notice that the Llama 3.2 1B Instruct has a very high probability for <|start_header_id|>
after a <|start_header_id|>
, and a near-zero probability for the classic system
, user
, and assistant
tokens that should be used instead, per the template. See the probabilities below:
<|begin_of_text|>: 8.225440979003906e-06, preferred: Tags 0.0029659271240234375
<|start_header_id|>: 8.344650268554688e-07, preferred: Tags 0.0029659271240234375
system: 0.0, preferred: <|start_header_id|> 0.9970703125
<|end_header_id|>: 3.2782554626464844e-06, preferred:
0.53076171875
Is that a mistake? Or is there a good reason for those probabilities? Wasn’t it trained with the system tokens?