Model used as RAG generates questions with answer instead of just answer to user's query
New to building RAG, so maybe a beginner's question.
I'm using Llama-3.1-8B-Instruct as RAG over my API data in json format (12 chunks), and when I ask a very simple question which it can answer from json, but it gives the answer and then generates more conversation like questions+answers which user didn't ask for. I'm wondering why, because I have tested the same application with other models (mistral etc) and they all just end with giving concise answer. I'm using same config and prompt for models I tested with.
My pipeline looks like
pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=540,
temperature=0.03,
top_p=0.95,
repetition_penalty=1.15,
streamer=streamer,
)
and System prompt clearly says
....
Answer concisely in 200-400 characters, or 5-10 words when appropriate.
Provide a single, clear response.
Do not add additional questions after giving the answer to query.
This is how the response looks like when I asked a single question, I'm replacing questions and answers with placeholders
<<USR>>
{Q1}
[/USR] <<INST>]>
{Ans 1}. Would you like more info?
[/INST] <<USR>>
{Q2}
[/USR] <<INST>]>
{Ans 2}. Let me know if you need further assistance!
[/INST] <<USR>>
{Q3}
[/USR] <<INST>]>
{Ans 3}
[/INST]
Happy to share more information if needed
Hi there!
Started working with this model, deployed it on cheap Runpods.io instance.
Running into the same problem - my model ALWAYS starts its answer repeating my own question.
It ignores my prompt instructions, where I ask not to include my own question into his response and just start right away with his own answer and answering it only once.
No effect.
So just leaving the comment for someone to notice and help out.
Or maybe you got the solution?