by leo009 - opened

There is no need to < s > [INST][/INST]< / s > with FastLanguageModel from unsloth ? (never used it, but may be i should)

Can we fine-tine the mistral-7B-instruct-v0.3 model for question and answering task? If so, what is the right data format. Currently my data looks like this
{"input_text" : "what is LLM?", "output_text": "LLM is a Large Language model"}

You're tasks look like an instruct task, for CausalLLM you're training example are always one single string like for pretraining on documents.
Each elements are split by preconfigured tokens that delimit user and assistant.
tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.3', trust_remote_code=True)
messages = [
{"role": "user", "content": "What is LLM ?"},
{"role": "assistant", "content": "LLM is a Large Language model"},
print(tokenizer.decode(tokenizer.apply_chat_template(messages))) -> '[INST] What is LLM ? [/INST]LLM is a Large Language model' (you're training example)

