Mistral does not finish the answers

#48
by expiderman - opened

I am developing a web application to be able to answer questions based on the context provided by documents that the user uploads to the application. The problem is that when I use the Mistral v0.2 model, the answers do not finish. They are cut off before finishing. If I use openai, the answers finish correctly. I use this prompt:

 template="""
     ### [INST] Instruccion: Responde en español a las preguntas del usuario según el contexto.
     Si no encunetras una respuesta adecuada en el contexto, responde que no tienes información suficiente.

{context}

### question:
{question} (responde en castellano) [/INST]
#"""

prompt = PromptTemplate(
        input_variables=['context','question'],
        template = template
)
vector = Chroma(client=db,
        collection_name="coleccion4",
        embedding_function=embeddings)
retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k":3})
llm = HuggingFaceHub(
        repo_id="mistralai/Mistral-7B-Instruct-v0.2",
        model_kwargs = {"temperature":0.4},
        huggingfacehub_api_token = apikey_huggingFace
)
respuesta = rag_chain.invoke(user_question)

when I run the code with openai, I get this response:

image.png

But when I use Mistral, the answer does not end:

image.png
why does this happen?

Hello,

For clarification, did you finetune the model?

Hello,

For clarification, did you finetune the model?

No. I'm using it as is, via huggingface.
I have tried with other temperature values, but the result is the same

This is usually not a temperature problem, if you can, increase "max_new_tokens", it's usually this parameter that allows you to increase the number of tokens to output.

This is usually not a temperature problem, if you can, increase "max_new_tokens", it's usually this parameter that allows you to increase the number of tokens to output.

I'm doing tests and I think it has solved the problem. I have set max_new_tokens to 2000. Do you know what is the maximum that can be used? or what is a suitable number?
Thank you very much for your answer!

Well...in theory there is no upper limit, as you can complete a text as much as you want... BUT, there is a limit factor, and that's the context window/length.

To make it simple, how much it will remember before start to forget the beginning of the text.

I believe Mistral as a context length of 4k*2, it can go up to 8k in theory with the slide window method that they use, but 4k would be the most reliable max length.

To sum up: 4k or 8k depending on what you want or need, more than 8k and it will start forgetting what it was supposed to do.

I'm having the same issue, however changing the max_new_token doesn't seem to change it.
For example, this is my current code:

llm = HuggingFaceHub(
         repo_id="mistralai/Mistral-7B-Instruct-v0.2", 
         model_kwargs={"temperature": 0.7, "max_new_token" : 8000},
         huggingfacehub_api_token=API_TOKEN
         )
      print(llm.invoke("[INST]Explain to me how to analyze IT ticket data.[\INST]"))

And the output I get is:


1. Data Collection: The first step is to collect all the relevant IT ticket data. This may include information such as ticket creation date, category, subcategory, priority, status, resolution time, requester, assignee, and any additional notes or comments. You can use IT service management (ITSM

Ive tried playing around with the max_new_token parameter, but nothing seems to be changing the output.

Could you try max_new_tokens instead of max new token ?

This looks as the template problem to me. Cleanup the template a bit. Remove the "###" before the [INST] and the last most "#" after the [/INST].
So the template would be:

 template="""[INST] Instruccion: Responde en español a las preguntas del usuario según el contexto.
Si no encunetras una respuesta adecuada en el contexto, responde que no tienes información suficiente.

{context}

### question:
{question} (responde en castellano) [/INST]
"""

This looks as the template problem to me. Cleanup the template a bit. Remove the "###" before the [INST] and the last most "#" after the [/INST].
So the template would be:

 template="""[INST] Instruccion: Responde en español a las preguntas del usuario según el contexto.
Si no encunetras una respuesta adecuada en el contexto, responde que no tienes información suficiente.

{context}

### question:
{question} (responde en castellano) [/INST]
"""

Thanks! This also helped me. Some models are very sensitive to templates.

Sign up or log in to comment