Response content was truncated

#84
by ludomare - opened

Hi,
I try to query in french but query get a truncated response. I try max_tokens=-1 in HuggingFaceHub(repo=...) or summary_chain.run(...) but it didn't work, any idea?

code:
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
text = 'travail.txt'
with open(text, 'r') as file:
essay = file.read()
from dotenv import load_dotenv
load_dotenv()
from langchain.llms import HuggingFaceHub
llm = HuggingFaceHub(repo_id="mistralai/Mixtral-8x7B-Instruct-v0.1", model_kwargs={"temperature":0.1})
text_splitter = RecursiveCharacterTextSplitter(
separators=['\n\n', '\n', '(?=>. )', ' ', ''],
chunk_size=3000,
chunk_overlap=500
)
docs = text_splitter.create_documents([essay])
from langchain.prompts import PromptTemplate
map_prompt = """
Ecrit un résumé précis en francais en 600 mots de :
"{text}"
Résumé précis:
"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])
combine_prompt = """
Ecrit un résumé précis en francais de :.
Retourne 10 points les éléments clés du texte en francais.
{text}
Résumé par points clés:
"""
combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=["text"])
summary_chain = load_summarize_chain(llm=llm,
chain_type='map_reduce',
map_prompt=map_prompt_template,
combine_prompt=combine_prompt_template,

verbose=True

                                )

output = summary_chain.run(docs)

answer (truncated):

  1. Giuseppe Rensi, philosophe italien, a écrit en 1923 un essai intitulé "Contre le travail" dans lequel il critique la morale de la société capitaliste qui insiste sur une conception du travail comme phénomène éthico-religieux de grande importance.
  2. Selon Rensi, le travail est une activité aliénante et désagréable qui est souvent

Hello,
I am encountering the same issue when trying the Inference API for the small model https://docs.mistral.ai/platform/endpoints/ (which is Mixtral). Most answers seem to be of good quality but oftentimes, the answer is cut in the middle. The endpoint does not receive many different parameters and I have tested all of them, so maybe this is just a model problem ?

Just to be sure, you are using the prompt format provided?

@GreyForever : what do you mean??

Your var map_prompt is not using the correct prompt format, just like the others have mentioned.

There is an example for you in the README.md, it looks like this:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]

Also, try setting min_tokens, max_tokens and / or max_new_tokens depending on what API you are using - do this in addition to fixing your prompt template.

@GreyForever : what do you mean??

As stated by Suparious, this model, the Instruct version, was trained to be able to follow instructions. But you are required to follow the prompt format like all LLMs similar to this one. If you don't then it's almost the same as using the base model (mistralai/Mixtral-8x7B-v0.1), so to have the Instruct version work properly you are required to follow the prompt format they used to train it.

thank you for your help !!.....
but in my example, map_prompt should be like:

map_prompt = """[INST]
Ecrit un résumé précis en francais en 1000 mots de :
"{text}"
Résumé précis:
[/INST]"""
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

because it doesn't work, the output is always truncated

output:

  • Giuseppe Rensi, un philosophe italien de Vérone, a écrit "Contre le travail" en 1923, où il argue que les humains ont une relation ambivalente avec le travail, le considérant à la fois nécessaire et aliénant.
  • Simone Weil, dans son essai "Le travail et la culture", explore le paradoxe moral entourant le travail, qui est considéré comme une

I, on my side, am using the Python client (https://github.com/mistralai/client-python) to call the API endpoint provided by Mistral (not free).
Here is an exemple of how I send my prompt :


def format_prompt(prompt: List[Dict[str, str]]) -> str:
        """Format input prompt to the Mistral API format
        """
        prompt = [
            ChatMessage(role=item['role'], content=item['content']) for item in prompt
        ]
        return prompt

def call_inference_endpoint(client: MistralClient, prompt: List[ChatMessage], **kwargs):
        """Get inference from the Mistral API
        """
        inference = client.chat(
            model="mistral-medium",
            messages=prompt,
            temperature=kwargs["temperature"],
            max_tokens=kwargs["max_tokens"],
            top_p=kwargs["top_p"])

        return inference

client = MistralClient(
            api_key="MY_API_KEY")
prompt = [{"role": "system", "content": "Tu es un assistant IA"}, {
        "role": "user", "content": "Quelles sont les dix premières décimales de pi ?"}]

result = call_inference_endpoint(client, prompt, temperature=0.2, max_tokens=100,
                                  top_p=1, frequency_penalty=1)

Sometimes the response is cut in the middle, especially if I ask to enumerate. Setting a greater value of max_tokens does not change anything.

Use 'max_new_tokens' instead, should solve the problem !

Use 'max_new_tokens' instead, should solve the problem !

Oh I'm sorry I forgot mistral does not use the same parameters hugging face uses, it wont work. Try it still.

thank you for your help !!.....
but in my example, map_prompt should be like:

map_prompt = [INST]"""
Ecrit un résumé précis en francais en 600 mots de :
"{text}"
Résumé précis:
"""[/INST]
map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])

because it doesn't work, I must do stomething stupid :(

why did u put [INST] before """ ? inst it supposed to be: map_prompt = "[INST] Ecrit un résumé précis en francais en 600 mots de :{text}Résumé précis: [/INST]"
I might be missing something tho...

why did u put [INST] before """ ? inst it supposed to be: map_prompt = "[INST] Ecrit un résumé précis en francais en 600 mots de :{text}Résumé précis: [/INST]"
I might be missing something tho.

---->I change by map_prompt = "[INST] Ecrit un résumé précis en francais en 600 mots de :{text}Résumé précis: [/INST]"
still doesn't work, output truncated :(

Then in my opinion all comes up to the équivalent of max_new_tokens like we stated before. By default this kind of things have a quite short max length token output. Try to find/search a parameter that allows u to overwrite this. I'm not used to this library/langchain, but it should be a parameter similar to "max_tokens".

To be more exact, where u insert the temperature try to add a new parameter "max_new_tokens" with the value you want, try a big one like 2000.

I cannot test it right now so I'm just trying to give advice, if nothing works I will check it later.

thank you a lot for your help !!! it works !!! "max_new_tokens" was the solution !!

ps: I try something to test : increasing the temperature makes all crash...strange...but with temperature at 0.1 all fine !! thank you again

I recommend you to avoid temperature values too high or too low, values around 1 (not 1, as it becomes deterministic) is perfect.

Nevertheless I am glad to be of help ! Have fun !

Hi @GreyForever ,
I am trying same but my problem is same and different.
I followed the instructions just like you mentioned in this thread, but my output has prompt text and trunctated output.
I tried many other things from langchain, but not use :(

prompt_template = """[INST]
You are an expert in generating questions from a transcript by understanding the context of conversation happening in transcript.
Take the below transcript of a video and create 3 open ended questions and 2 multiple choice questions that are engaging and appropriate to ask a 10th grader in the US about what happened during this session:
----------
[/INST]
{text}
[INST]
----------
From above transcript of a video and create open ended questions and multiple choice questions that are engaging and appropriate to ask a 10th grader in the US about what happened during this session.
OUTPUT:
[/INST]
"""

PROMPT_SUMMARY = PromptTemplate(template = prompt_template, input_variables=['text'])
question_chain = LLMChain(llm = llm, prompt = PROMPT_SUMMARY,return_final_only=True, output_key="questions")

stuff_chain = StuffDocumentsChain(
    llm_chain=question_chain
)

output = stuff_chain.invoke(docs)

Nothing is working. Can you please tell me where did i made mistake please!!

Hi ! First of all I think you can have a better prompt format than that, the <s> and </s> make quite the difference. Also, I believe you misunderstood what the special tokens mean. [INST] is always before an instruction, [/INST] is to end an instruction. I'm also struggling to understand exactly what you want it to do for you... so here is a new example for you:

For a one shot:
"[INST] Take the below transcript of a video and create 3 open ended questions: {text} [/INST]"
And if you want it to always respond with a specific format it's better to give it an example before hand, like:
"<s> [INST] Take the below transcript of a video and create 3 open ended questions: SOME TRANSCRIPT HERE [/INST] THREE OPEN ENDED QUESTIONS SIMILAR TO WHAT YOU WANT IT TO ANSWER </s> [INST] Make 3 more open ended questions for this transcript: {text} [/INST]"

Note this is just an example, there are ways to make it work better, but I hope it will allow you to understand how the prompt format works.

If it still truncates the output, then it's mostly similar reasons like the one before you. Check all the parameters, and if you can, make max_new_tokens (or max_tokens or similar, depends what you are using, I do not know langchain by heart) to be a high value.

I hope I was of help !

This comment has been hidden

Sign up or log in to comment