Proper prompt?
What is the proper prompt to use this model?
Looking at the tokenizer used for the qlora:
https://huggingface.co/ShinojiResearch/Senku-70B/blob/main/tokenizer_config.json
It looks to be the same as the original Mistral/Mixtral/Miqu prompt template:
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
eg:
Instruction format
In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"
Or in Ollama template format:
TEMPLATE """[INST] {{ if and .First .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]{{ .Response }}"""
The wrapped llama.cpp
server adds the leading <s>
token AFAIK and from experiments with Mixtral; it's important to not add a space after the closing [/INST]
tag.
Actually it might not be this if you look at this discussion from the fp16 dequant this model was fine-tuned off:
https://huggingface.co/152334H/miqu-1-70b-sf/discussions/11
Although
@jackboot
does mention about not adding a space after the closing [/INST]
tag, which makes me think it is the same as the original Mistral/Mixtral/Miqu prompt template:
I have been fighting with it being " [/INST] \n" or just " [/INST]\n" and now "[/INST]REPLY</.s>" The model will reply about the same to all of them, but using the more "correct" one leads to less commentary or NOTES:
No intentional change was made. I have found that ChatML also works reasonably well, and interestingly when asked the who are you questions forgets it is a mistral model.
Most of the testing I did was via the defualt recognized one on Oogabooga (which should be mistral).
Possible a bit more finetuning could fix any weirdness with the prompting?
Edit:Mixed, I am recommending ChatML from this point on.
Avoids the bug that Miqu SF has and scores slightly higher on EQ bench, slower on GSM8k. I will consider a v2 where this issue is corrected.
If anyone has any suggestions on further finetuning that could improve / correct this, would be happy to attempt it.
I used chatml for the original mixtral-instruct.
ChatML seems to perform better in EQ-Bench, I confirmed such thing even with the reproduced experiments.
You can reproduce the training with chatml
as template and adding the start and end tokens.
tokens: # these are delimiters
- "<|im_start|>"
- "<|im_end|>"
chat_template: chatml