Incomplete Answers

#59
by samparksoftwares - opened

Model is giving incomplete answers or rather stops generating answers at any point despite increasing max_new_token and decreasing prompt's context length.

I've had this behaviour as well until I realised I wasn't following the prompt format properly. Make sure you use the white spaces and BOS/EOS tokens appropriately. The EOS tokens are very important. And the whitespaces are also very important.

Prompt:

<s> [INST] What is 1+1? [/INST] The sum of 1 + 1 is 2. Is there anything else you would like to know about mathematics or another topic? I'm here to help with any questions you have to the best of my ability.</s> [INST] What is 1+2? [/INST]

Prompt answered:

<s> [INST] What is 1+1? [/INST] The sum of 1 + 1 is 2. Is there anything else you would like to know about mathematics or another topic? I'm here to help with any questions you have to the best of my ability.</s> [INST] What is 1+2? [/INST] The sum of 1 + 2 is 3. In arithmetic, addition is the operation of combining two numbers to produce a third number. For example, in the expression "2 + 3 = 5," 2 and 3 are the addends, and 5 is the sum. Addition can be represented using symbols, such as the plus sign "+", or by the word "and." It is one of the basic operations in mathematics, along with subtraction, multiplication, and division. Is there anything else you would like to know about mathematics or another topic? I'm here to help with any questions you have.

Very important:

  • After the BOS (<s>), there is a whitespace. There must be only one BOS in your prompt.
  • Before any EOS (</s>), there is no whitespace.
  • All instructions must be between [INST] and [/INST], notice the whitespace after and before them.
  • At the end of your prompt, there must be no whitespace after the last [/INST]. Completion will add it for you.
  • Make sure the software you use doesn't already add the BOS/EOS tokens to the prompt, otherwise you must decide who adds them (either yourself or the software), but you need to make sure they are added correctly and that the whitespaces are present.

https://github.com/huggingface/blog/blob/main/mixtral.md

I have attributes like system prompt, chat history, context, and question can you help me with how to customize the prompt to get complete answers.

I don't believe this model supports system prompt, but you can certainly include it in the [INST] [/INST]. Chat history is just a succession of [INST] and answers as I've demonstrated in my previous post.

I have been using this format:
B_INST, E_INST = "[INST]","[/INST]"
SYSTEM_PROMPT = B_SYS + prompt + E_SYS
instruction = """
Context:{chat_history} \n {context}
User:{question}"""

prompt_template = B_INST + SYSTEM_PROMPT + instruction + E_INST

this format is not giving complete answer and if you can help me correct this format that will be very helpful.

Same here, even with the correct prompt format it stops generating at some code or command line.

Hi everyone,
After doing hit and try methods I was able to get whole answer from mixtral model by using same prompt I have mentioned above. Just by calling llm from HuggingFaceInferenceAPI from llama index or langchain.

Proof of concept to construct valid Mixtral prompts using Python:

Set up a Python environment:

virtualenv-3.10 --python=python3.10 ~/test
cd test
source bin/activate*
pip install mistral-common transformers jinja2

Create a file named modified_script.py (modified Python script from https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1):

from mistral_common.protocol.instruct.messages import (
    AssistantMessage,
    UserMessage,
)
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.instruct.normalize import ChatCompletionRequest

from transformers import AutoTokenizer

tokenizer_v3 = MistralTokenizer.v3()

mistral_query = ChatCompletionRequest(
    messages=[
        UserMessage(content="How many experts ?"),
        AssistantMessage(content="8"),
        UserMessage(content="How big ?"),
        AssistantMessage(content="22B"),
        UserMessage(content="Noice πŸŽ‰ !"),
    ],
    model="test",
)
hf_messages = mistral_query.model_dump()['messages']

tokenizer_hf = AutoTokenizer.from_pretrained('mistralai/Mixtral-8x22B-Instruct-v0.1')

print(tokenizer_hf.apply_chat_template(hf_messages, tokenize=False))

Execute the script:

python modified_script.py

Output of the above command:

<s> [INST] How many experts ? [/INST] 8 </s> [INST] How big ? [/INST] 22B </s> [INST] Noice πŸŽ‰ ! [/INST]

Hope this helps.

Sign up or log in to comment