Model can't stop from explaining itself

#53
by fedeparra - opened

Hi. Here's a problem I've noticed with all Phi models so far:

prompt = "You are a robot. Federico, your owner, says in Spanish 'por favor, ven a la cocina'. Please select among the following options the action that seems more appropriate to Federico's injunction: NAVIGATE, JUMP, DANCE, TURN, JOKE. Limit your response to one word."

Output: NAVIGATE

In this scenario, the most appropriate action for the robot to take in response to Federico's command "por favor, ven a la cocina" (please, come to the kitchen) would be to navigate towards the kitchen. The other options (JUMP, DANCE, JOKE) do not align with the request to move to a specific location.
Prompt length.


As you can see I specifically asked for one word response but the model can't help itself: it "feels" obliged to explain it's reasoning.

This is terrible for using the model as a parser or for limited set decisions like in this example, as we can't rely on the response.

Limiting output to just a bunch of tokens can help but different words have different token lengths so that's not a solution.

Besides, I'm sure the experts at Microsoft know some specific prompting magic that can make the model less verbose?

Hi,

Have you used chat format in your prompt? Something like:

<|user|>\nQuestion<|end|>\n<|assistant|>

Since it is an instruct version, formatting your prompt in the chat format might be hlepful.

I did follow that pattern, and I did try 20 or so different prompts to suggest the model to stop after one word, without success.

I'm encountering a similar problem. The model keeps repeating its answer and does not stop until reaches the max_new_token limit.

Microsoft org

It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?

It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?

I don't think it's that since the model doesn't continue generating indefinitely - it does end right after the explanation; it just needs to explain it's reasoning and will not obey orders to not do so. It looks like it was trained specifically to explain its responses and can't help but do so no matter how much we ask not to.

Microsoft org

That’s odd. Even though I tried the 4k, it gives me the correct response on the Inference API:

IMG_3590.jpeg

When I removed the last instruction, it produces some additional content:

IMG_3591.jpeg

To show that it is not being cut due to amount of tokens, also used the following:

IMG_3592.jpeg

Microsoft org
edited May 2

Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.

Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.

Interesting! I'm using the onnx version provided by Microsoft (it's in the same collection) that uses the new onnx runtime generator. Also, it's 4 bit quantized, and quantized models sometimes have issues with stop words.

I thought this was a common problem to all the versions. Now that I see that's not the case I'll rather repost on the onnx version - and I'll also check the stop word issue.

Thank you!

nguyenbh changed discussion status to closed

I have this exact problem ! And I don't know how to fix it :

Je parle plusieurs langues, y compris le français, l'anglais, l'espagnol, l'allemand, le chinois, le russe, le japonais, le portugais, l'italien et bien d'autres. En tant qu'intelligence
artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier. Je peux fournir
des informations, répondre à des questions et aider dans diverses tâches dans ces langues.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier. Je peux fournir des informations, répondre à des questions et aider dans diverses tâches dans ces langues.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du

and he never stops texting until it reaches the limit. I don't understand why, I set the eos_token_id to 32000 (which is <|endoftext|> in the tokenizer json). Someone mays help me ?

I have the same problem, can anyone find a solution?

Microsoft org

Thanks for reporting this issue. Can you share your example?

I'm having the exact same problem, Here is an example of the responses:

{"role": "user", "content": f"Request: Classify the following customer feedback in one of these categories, DON'T EXPLAIN WHY, JUST GIVE THE CATEGORY " Categories: [Price/Premium, Customer Service, Payment, Advisor Knowledge, Waiting Time, Unde1rwriting, Processes, Others, No reason provided] Customer feedback: Poor service regarding my payments"},
{"role": "assistant", "content": "Payment"},
{"role": "user", "content": f"Request: Classify the following customer feedback in one of these categories, DON'T EXPLAIN WHY, JUST GIVE THE CATEGORY " Categories: [Price/Premium, Customer Service, Payment, Advisor Knowledge, Waiting Time, Unde1rwriting, Processes, Others, No reason provided] Customer feedback: Because the stuff aren’t willing to help"},
]

And the reply is:
"Customer Service

For the more difficult instruction, here are three follow-up questions with elabor"

It ends there because of 20 tokens limit.

In my example, I just asked in which languages can he speaks but it totally goes wrong. It continues the discussion on his own until he reached the max tokens limit I'd set.

The problem, I guess, is the eos_token. The model writes an eos_token but it's the wrong one so it continues.. I tried to change the eos_token but I have always the same issue

Sign up or log in to comment