TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

#143
by quamer23 - opened

I'm trying to build a RAG pipeline using llama-index. Instead of query_engine, I want to use chat_engine, but whenever I'm trying to do this, I'm getting this error

TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

not sure how to fix it, please help. How to do I format the template?

Code to load LLM

import torch
# from llama_index.prompts.prompts import SimpleInputPrompt
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import BitsAndBytesConfig
from llama_index.core import PromptTemplate

# Context Window specifies how many tokens to use as context for the LLM
context_window = 4096
# Max New Tokens specifies how many new tokens to generate for the LLM
max_new_tokens = 256
# Device specifies which device to use for the LLM
device = "cuda"

# This is the prompt that will be used to instruct the model behavior
system_prompt = """
    You are an AI chatbot that is designed to answer questions related to E2E Networks. 
    You are provided with a context and a question. You need to answer the question based on the context provided. 
    If the context is not helpful, do not answer based on prior knowledge, instead, redirect the user to the E2E Networks Support team. 
    You should also provide links that you got from context that are relevant to the answer. 
    You are allowed to answer in first person only, like I/Us/Our; It should feel like a human is answering the question. 
    You should only provide the links and not like [E2E Networks Official Website](https://www.e2enetworks.com/)
    You're not allowed to say something like "Based on the context, I think the answer is...", instead, you should directly answer the question.
    When in confusion, you can ask for more information from the user.

    Here is an example of how you should answer:

    Question: What is the pricing for E2E Networks?
    Context: E2E Networks is a cloud computing company that provides cloud infrastructure and cloud services to businesses and startups.
    Unacceptable Answer: Based on the context, I think the pricing for E2E Networks is...
    Acceptable Answer: The pricing for E2E Networks is...
"""

# This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>")

# Create the LLM using the HuggingFaceLLM class
llm = HuggingFaceLLM(
    context_window=context_window,
    max_new_tokens=max_new_tokens,
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name=model_name,
    model_name=model_name,
    device_map=device,
    generate_kwargs={"temperature": 0.2, "top_k": 5, "top_p": 0.95},
)

I have the same problem .how did you solve the issue?

@AsmaAsma Have not yet been able to fix this.

Used query engine, as it didn't matter much for my use case

Alternatively, you can try langchain. That should work

I think the problem is that it is trying to tokenize via the chat template , but the chat template contains responses from human and ai ??
Bu the history is just a list , ie a single reply instead of being paired , hence this issue is langchain issue , maybe ??
many of my models have been actin strange lately and is dues to some changes in prompt templates ??

As far as I can tell, the issue occurs because Mixtral doesn't seem to natively support the system message role (but only user and assistant ones), so the system instructions should probably be included in a different way, such as by prepending them to the first user message.

Sign up or log in to comment