The inference API Endpoint gives wrongly formatted answer based on the context given but works well in example Spaces. How we can fix this?

#125
by rkhapre - opened

Hi All,

This is my input to Mixtral and i am using inference API Endpoint to test before production

{ "inputs":"You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. Do not give the context again in your response\nGenerate a concise and informative answer in less than 100 words for the given question\nSEARCH RESULT 1: The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.\nQUESTION: What is the release date of rush?\n" }

The answer i am getting is this, which is completely wrong

[ { "generated_text": "You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. Do not giev the context again in your response\nGenerate a concise and informative answer in less than 100 words for the given question\nSEARCH RESULT 1: The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.\nQUESTION: What is the release date of rush?\nANSWER: The movie Rush was released in 2013." } ]

My expected answer is below, how can i achieve this?

[ { "generated_text": "The movie Rush was released in 2013." } ]

Hi ! Well, first of all: You are not using the prompt format. And lastly, in this use case scenario you might want to set the parameter return_full_text: False.
Try this:

"[INST] You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. Do not give the context again in your response\nGenerate a concise and informative answer in less than 100 words for the given question\nSEARCH RESULT 1: The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.\nQUESTION: What is the release date of rush?\n[/INST]"

Or, if you want the entire code:

import requests

API_URL = "https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1"
headers = {"Authorization": "Bearer ####"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()
    
output = query({
    "inputs": "[INST] You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. Do not give the context again in your response\nGenerate a concise and informative answer in less than 100 words for the given question\nSEARCH RESULT 1: The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.\nQUESTION: What is the release date of rush?\n[/INST]",
    "parameters": {
        "return_full_text": False
    }
})

print(output)

This should do !

Hi,
Thanks for the above information its useful. But i am applying prompt before my search result, like this below
I am closing the [/INST] before the SEARCH RESULT 1

With this i am getting "ANSWER : ......................"

How can i avoid this?

[INST] You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. Do not give the context again in your response\nGenerate a concise and informative answer in less than 100 words for the given question[/INST]\nSEARCH RESULT 1: The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.\nQUESTION: What is the release date of rush?\n

I am getting this below and i do not want "ANSWER:

[
{
"generated_text": "ANSWER: The movie Rush was released in 2"
}
]

This is really confusing, why are you putting the search result after the [/INST] ? [/INST] it's a signal to the model to generate an answer, it's as if you were doing something like:

User: You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. Do not give the context again in your response\nGenerate a concise and informative answer in less than 100 words for the given question
Bot: SEARCH RESULT 1: The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.\nQUESTION: What is the release date of rush?\n

The model is completing the text, but you are not really supposed to use it like this, try like the example I gave you before:

[INST] You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. \nGenerate a concise and informative answer in less than 100 words for the given question.\nCONTEXT: The title is Rush, year is 2013, budget is 500000, earning is 300000, genre is action.\nQUESTION: What is the release date of rush?[/INST]

This should work better I believe ! Tell me what you think.

PS: If it cuts the answer and does not give you the entire thing, try using the parameter "max_new_tokens" with a high value like:

    "parameters": {
        "return_full_text": False,
        "max_new_tokens": 256
    }

Hi @pandora-s .
Thanks for above responses, i am facing another problem like below , where it is generating roles like ASSISTANT , ASSISTANT

if i am keeping my previous 5 interaction prompts and responses as a context to this LLM, it generates ASSISTANT multiple times like below is a example, how we can fix this?

This is the below response

ASSISTANT: ASSISTANT: The customer name for Order ID CA-2017-145233 is Dori Sori. The product names are GE ABCD, EFGH, and XYZA. The order status is Processing for the first two products and Delivered for the last one.

Thanks

I would need more information and the code to be sure about what the problem is, but my guess is that, because now we are dealing with followed up questions and an entire chat, that you need to use the full prompt template. It being the following if we keep the previous example:

<s> [INST] You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. \nGenerate a concise and informative answer in less than 100 words for the given question.\nCONTEXT: The title is Rush, year is 2013, budget is 500000, earning is 300000, genre is action.\nQUESTION: What is the release date of rush?[/INST] RESPONSE </s> [INST] NEW QUESTION OR WHATEVER [/INST]

So as you can guess, at the beginning of the prompt you need to add <s>, and every time it answers you, you must add </s> at the end !

As said, I cannot be sure this is the problem as I do not have the full code, but that's my guess !

Glad to be helpfull !!

This comment has been hidden

We are giving the input to LLM in this way

<s> [INST] You are a helpful chatbot assistant which provides answer based on the context given. Do not give any extra information. Do not give the context again in your response\nGenerate a concise and informative answer in less than 100 words for the given question\nSEARCH RESULT 1: The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.\nQUESTION: What is the release date of rush?[/INST]

With the above format i am getting this answer

"ANSWER: The movie Rush was released in 2013."

Only difference is instead of saying Context, we have SEARCH RESULT 1 coming from Vector DB

not sure why everything crossed with a line. Sorry for that, i tried to re-comment it

My main question is how you handle multiple interactions with the bot, you keep track of previous conversations? If yes, be sure to follow the format like the example I gave.
If you do not like the fact it answers with "ANSWER" then you can give it an example or rewrite the prompt without the "QUESTION:", as it's most likely the trigger keyword creating the "ANSWER:" one !

Sign up or log in to comment