Please use the correct prompt template

#3
by froggeric - opened

I think the prompt template you supplied in your readme is wrong. Using the correct template for miqu makes a huge difference. Please see my post on reddit for more details.

The prompt should be:

[INST] {System}[/INST][INST] {User}[/INST] {Assistant}

https://www.reddit.com/r/LocalLLaMA/comments/1b1gxmq/the_definite_correct_miqu_prompt/

Just saw this linked from Reddit and wanted to say there is another prompt format that I found too:

[INST] {prompt1}
[\INST]{response1}[INST] {prompt2}
[\INST]{response2}

I originally got it by asking Miquabout the formathe saw during training, starting with a blank prompt template and restarting each time he got confused (he seems to find it hard to write "[\INST]" more than a couple of times...).

Main discussion here: https://huggingface.co/miqudev/miqu-1-70b/discussions/25

(I've tried a few ways to add the system prompt which I outline in that thread).

I've also confirmed this is the prompt format he starts to hallucinate when you do a merge with Codellama-70b, eg:

[INST] hi
[/INST] hello[INST] my name is...
[/INST] nice to meet you... 

Interestingly he will also often end his responses with "Confidence X%" if you use the suggested template!


I tried the prompt format the OP suggested and it performs much worse for me on coding tasks and seems, but I can't rule out bugs in Ollama templating or the wrapped llama.cpp server. Just about any other prompt template than the one I suggested makes him much worse at coding tasks overall, but with the correct/suggested template he is actually one of the best for task like refactoring.

Another good test I found is to ask it about some obscure machine learning papers from the 80s and 90s - Miqu is the only 70b model that seems to be (properly) trained on these. If you use the wrong prompt format he will quite confidently get mixed up and use the wrong names, etc.

eg: try asking in a round about way about the paper:

"Recursive distributed representations", J. B. Pollack - Artificial Intelligence 46 (1-2):77-105 (1990)

It's a particularly good one to ask about as there was lots of other research around the same time that used similar ideas/names and all the other 70b models seem to have some kind of "superimposed" version of the facts, but with the correct/suggested prompt format Miquknows the J stands for Jordan and can tell you quite well what this paper is about.

Sign up or log in to comment