Model can't stop from explaining itself
Hi. Here's a problem I've noticed with all Phi models so far:
prompt = "You are a robot. Federico, your owner, says in Spanish 'por favor, ven a la cocina'. Please select among the following options the action that seems more appropriate to Federico's injunction: NAVIGATE, JUMP, DANCE, TURN, JOKE. Limit your response to one word."
Output: NAVIGATE
In this scenario, the most appropriate action for the robot to take in response to Federico's command "por favor, ven a la cocina" (please, come to the kitchen) would be to navigate towards the kitchen. The other options (JUMP, DANCE, JOKE) do not align with the request to move to a specific location.
Prompt length.
As you can see I specifically asked for one word response but the model can't help itself: it "feels" obliged to explain it's reasoning.
This is terrible for using the model as a parser or for limited set decisions like in this example, as we can't rely on the response.
Limiting output to just a bunch of tokens can help but different words have different token lengths so that's not a solution.
Besides, I'm sure the experts at Microsoft know some specific prompting magic that can make the model less verbose?
Hi,
Have you used chat format in your prompt? Something like:
<|user|>\nQuestion<|end|>\n<|assistant|>
Since it is an instruct version, formatting your prompt in the chat format might be hlepful.
I did follow that pattern, and I did try 20 or so different prompts to suggest the model to stop after one word, without success.
I'm encountering a similar problem. The model keeps repeating its answer and does not stop until reaches the max_new_token limit.
It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?
It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?
I don't think it's that since the model doesn't continue generating indefinitely - it does end right after the explanation; it just needs to explain it's reasoning and will not obey orders to not do so. It looks like it was trained specifically to explain its responses and can't help but do so no matter how much we ask not to.
Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.
Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.
Interesting! I'm using the onnx version provided by Microsoft (it's in the same collection) that uses the new onnx runtime generator. Also, it's 4 bit quantized, and quantized models sometimes have issues with stop words.
I thought this was a common problem to all the versions. Now that I see that's not the case I'll rather repost on the onnx version - and I'll also check the stop word issue.
Thank you!
I have this exact problem ! And I don't know how to fix it :
Je parle plusieurs langues, y compris le français, l'anglais, l'espagnol, l'allemand, le chinois, le russe, le japonais, le portugais, l'italien et bien d'autres. En tant qu'intelligence
artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier. Je peux fournir
des informations, répondre à des questions et aider dans diverses tâches dans ces langues.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier. Je peux fournir des informations, répondre à des questions et aider dans diverses tâches dans ces langues.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du
and he never stops texting until it reaches the limit. I don't understand why, I set the eos_token_id to 32000 (which is <|endoftext|> in the tokenizer json). Someone mays help me ?
I have the same problem, can anyone find a solution?
Thanks for reporting this issue. Can you share your example?
I'm having the exact same problem, Here is an example of the responses:
{"role": "user", "content": f"Request: Classify the following customer feedback in one of these categories, DON'T EXPLAIN WHY, JUST GIVE THE CATEGORY " Categories: [Price/Premium, Customer Service, Payment, Advisor Knowledge, Waiting Time, Unde1rwriting, Processes, Others, No reason provided] Customer feedback: Poor service regarding my payments"},
{"role": "assistant", "content": "Payment"},
{"role": "user", "content": f"Request: Classify the following customer feedback in one of these categories, DON'T EXPLAIN WHY, JUST GIVE THE CATEGORY " Categories: [Price/Premium, Customer Service, Payment, Advisor Knowledge, Waiting Time, Unde1rwriting, Processes, Others, No reason provided] Customer feedback: Because the stuff aren’t willing to help"},
]
And the reply is:
"Customer Service
For the more difficult instruction, here are three follow-up questions with elabor"
It ends there because of 20 tokens limit.
In my example, I just asked in which languages can he speaks but it totally goes wrong. It continues the discussion on his own until he reached the max tokens limit I'd set.
The problem, I guess, is the eos_token. The model writes an eos_token but it's the wrong one so it continues.. I tried to change the eos_token but I have always the same issue