Model can't stop from explaining itself

#53

by fedeparra - opened May 2

Discussion

fedeparra

May 2

•

edited May 2

Hi. Here's a problem I've noticed with all Phi models so far:

prompt = "You are a robot. Federico, your owner, says in Spanish 'por favor, ven a la cocina'. Please select among the following options the action that seems more appropriate to Federico's injunction: NAVIGATE, JUMP, DANCE, TURN, JOKE. Limit your response to one word."

Output: NAVIGATE

In this scenario, the most appropriate action for the robot to take in response to Federico's command "por favor, ven a la cocina" (please, come to the kitchen) would be to navigate towards the kitchen. The other options (JUMP, DANCE, JOKE) do not align with the request to move to a specific location.
Prompt length.

As you can see I specifically asked for one word response but the model can't help itself: it "feels" obliged to explain it's reasoning.

This is terrible for using the model as a parser or for limited set decisions like in this example, as we can't rely on the response.

Limiting output to just a bunch of tokens can help but different words have different token lengths so that's not a solution.

Besides, I'm sure the experts at Microsoft know some specific prompting magic that can make the model less verbose?

bitmman-nch

May 2

•

edited May 2

Hi,

Have you used chat format in your prompt? Something like:

<|user|>\nQuestion<|end|>\n<|assistant|>

Since it is an instruct version, formatting your prompt in the chat format might be hlepful.

fedeparra

May 2

I did follow that pattern, and I did try 20 or so different prompts to suggest the model to stop after one word, without success.

bitmman-nch

May 2

I'm encountering a similar problem. The model keeps repeating its answer and does not stop until reaches the max_new_token limit.

gugarosa

Microsoft org May 2

It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?

fedeparra

May 2

It could be related to some missing stop tokens. Could you please retry the generation using 32000, 32001 and 32007 as the stop tokens?

I don't think it's that since the model doesn't continue generating indefinitely - it does end right after the explanation; it just needs to explain it's reasoning and will not obey orders to not do so. It looks like it was trained specifically to explain its responses and can't help but do so no matter how much we ask not to.

gugarosa

Microsoft org May 2

That’s odd. Even though I tried the 4k, it gives me the correct response on the Inference API:

When I removed the last instruction, it produces some additional content:

To show that it is not being cut due to amount of tokens, also used the following:

gugarosa

Microsoft org May 2

•

edited May 2

Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.

fedeparra

May 2

Additional explanation can be caused by stop tokens. For example, if the model generates a <|end|> and does not stop, it will try to keep generating extra information since it expects an user query or an assistant response.

Interesting! I'm using the onnx version provided by Microsoft (it's in the same collection) that uses the new onnx runtime generator. Also, it's 4 bit quantized, and quantized models sometimes have issues with stop words.

I thought this was a common problem to all the versions. Now that I see that's not the case I'll rather repost on the onnx version - and I'll also check the stop word issue.

Thank you!

nguyenbh changed discussion status to closed May 22

nelkh

Jun 6

I have this exact problem ! And I don't know how to fix it :

Je parle plusieurs langues, y compris le français, l'anglais, l'espagnol, l'allemand, le chinois, le russe, le japonais, le portugais, l'italien et bien d'autres. En tant qu'intelligence
artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier. Je peux fournir
des informations, répondre à des questions et aider dans diverses tâches dans ces langues.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du monde entier. Je peux fournir des informations, répondre à des questions et aider dans diverses tâches dans ces langues.<|end|><|assistant|> En tant qu'intelligence artificielle, je suis conçue pour comprendre et communiquer dans de nombreuses langues, ce qui me permet d'interagir avec des utilisateurs du

and he never stops texting until it reaches the limit. I don't understand why, I set the eos_token_id to 32000 (which is <|endoftext|> in the tokenizer json). Someone mays help me ?

NewmrRobot

Jun 13

I have the same problem, can anyone find a solution?

nguyenbh

Microsoft org Jun 13

Thanks for reporting this issue. Can you share your example?

Acarasas

Jun 24

•

edited Jun 24

I'm having the exact same problem, Here is an example of the responses:

{"role": "user", "content": f"Request: Classify the following customer feedback in one of these categories, DON'T EXPLAIN WHY, JUST GIVE THE CATEGORY " Categories: [Price/Premium, Customer Service, Payment, Advisor Knowledge, Waiting Time, Unde1rwriting, Processes, Others, No reason provided] Customer feedback: Poor service regarding my payments"},
{"role": "assistant", "content": "Payment"},
{"role": "user", "content": f"Request: Classify the following customer feedback in one of these categories, DON'T EXPLAIN WHY, JUST GIVE THE CATEGORY " Categories: [Price/Premium, Customer Service, Payment, Advisor Knowledge, Waiting Time, Unde1rwriting, Processes, Others, No reason provided] Customer feedback: Because the stuff aren’t willing to help"},
]

And the reply is:
"Customer Service

For the more difficult instruction, here are three follow-up questions with elabor"

It ends there because of 20 tokens limit.

nelkh

Jun 24

•

edited Jun 24

In my example, I just asked in which languages can he speaks but it totally goes wrong. It continues the discussion on his own until he reached the max tokens limit I'd set.

nelkh

Jun 24

The problem, I guess, is the eos_token. The model writes an eos_token but it's the wrong one so it continues.. I tried to change the eos_token but I have always the same issue

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment