Mistral does not finish the answers

#48

by expiderman - opened Feb 8, 2024

Feb 8, 2024

I am developing a web application to be able to answer questions based on the context provided by documents that the user uploads to the application. The problem is that when I use the Mistral v0.2 model, the answers do not finish. They are cut off before finishing. If I use openai, the answers finish correctly. I use this prompt:

 template="""
     ### [INST] Instruccion: Responde en español a las preguntas del usuario según el contexto.
     Si no encunetras una respuesta adecuada en el contexto, responde que no tienes información suficiente.

{context}

### question:
{question} (responde en castellano) [/INST]
#"""

prompt = PromptTemplate(
        input_variables=['context','question'],
        template = template
)
vector = Chroma(client=db,
        collection_name="coleccion4",
        embedding_function=embeddings)
retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k":3})
llm = HuggingFaceHub(
        repo_id="mistralai/Mistral-7B-Instruct-v0.2",
        model_kwargs = {"temperature":0.4},
        huggingfacehub_api_token = apikey_huggingFace
)
respuesta = rag_chain.invoke(user_question)

when I run the code with openai, I get this response:

But when I use Mistral, the answer does not end:

why does this happen?

Berketarak

Feb 10, 2024

Hello,

For clarification, did you finetune the model?

expiderman

Feb 12, 2024

Hello,

For clarification, did you finetune the model?

No. I'm using it as is, via huggingface.
I have tried with other temperature values, but the result is the same

pandora-s

Mistral AI_ org Feb 12, 2024

This is usually not a temperature problem, if you can, increase "max_new_tokens", it's usually this parameter that allows you to increase the number of tokens to output.

expiderman

Feb 13, 2024

This is usually not a temperature problem, if you can, increase "max_new_tokens", it's usually this parameter that allows you to increase the number of tokens to output.

I'm doing tests and I think it has solved the problem. I have set max_new_tokens to 2000. Do you know what is the maximum that can be used? or what is a suitable number?
Thank you very much for your answer!

pandora-s

Mistral AI_ org Feb 13, 2024

Well...in theory there is no upper limit, as you can complete a text as much as you want... BUT, there is a limit factor, and that's the context window/length.

To make it simple, how much it will remember before start to forget the beginning of the text.

I believe Mistral as a context length of 4k*2, it can go up to 8k in theory with the slide window method that they use, but 4k would be the most reliable max length.

To sum up: 4k or 8k depending on what you want or need, more than 8k and it will start forgetting what it was supposed to do.

JohananYTL

Feb 22, 2024

I'm having the same issue, however changing the max_new_token doesn't seem to change it.
For example, this is my current code:

llm = HuggingFaceHub(
         repo_id="mistralai/Mistral-7B-Instruct-v0.2", 
         model_kwargs={"temperature": 0.7, "max_new_token" : 8000},
         huggingfacehub_api_token=API_TOKEN
         )
      print(llm.invoke("[INST]Explain to me how to analyze IT ticket data.[\INST]"))

And the output I get is:


1. Data Collection: The first step is to collect all the relevant IT ticket data. This may include information such as ticket creation date, category, subcategory, priority, status, resolution time, requester, assignee, and any additional notes or comments. You can use IT service management (ITSM

Ive tried playing around with the max_new_token parameter, but nothing seems to be changing the output.

pandora-s

Mistral AI_ org Feb 22, 2024

Could you try max_new_tokens instead of max new token ?

bumel

Mar 7, 2024

•

edited Mar 7, 2024

This looks as the template problem to me. Cleanup the template a bit. Remove the "###" before the [INST] and the last most "#" after the [/INST].
So the template would be:

 template="""[INST] Instruccion: Responde en español a las preguntas del usuario según el contexto.
Si no encunetras una respuesta adecuada en el contexto, responde que no tienes información suficiente.

{context}

### question:
{question} (responde en castellano) [/INST]
"""

expiderman

Mar 7, 2024

This looks as the template problem to me. Cleanup the template a bit. Remove the "###" before the [INST] and the last most "#" after the [/INST].
So the template would be:
 template="""[INST] Instruccion: Responde en español a las preguntas del usuario según el contexto.
Si no encunetras una respuesta adecuada en el contexto, responde que no tienes información suficiente.

{context}

### question:
{question} (responde en castellano) [/INST]
"""

Thanks! This also helped me. Some models are very sensitive to templates.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment