SFT is so BAD

#99
by GokhanAI - opened

Hello,
My question is, I am applying a model SFT to chat data. I do this with the LLama-7b&13b-Chat-hf and Mistral-Instruction_V_2.0 models.

But my question is, should we make this type of chat format SFT or should we apply DPO or RHLF directly?

I apply LORA (for SFT).

I observed that when I perform SFT with Lllama, the model loses its structure slightly, but it can also answer other questions other than data.

But when I do this on Mistral, it loses its entire structure and cannot answer any questions other than data.

What's the problem with this? What path should we follow? Isn't SFT suitable?
Do we need to do DPO or RHLF directly? Topics I really don't understand. Where am I doing wrong?

Also;
I am doing these operations by going over ready-made coding such as SFTrainer and DeepSpeedChat, but what should be the difference? How can I continue without destroying the structure?

Caution: You can pay attention to BOS and EOS.

The format is the same as this example;

Llama Example:
-- > [INST] hello [/INST] how can i help you [INST] .... [/INST] ....

Mistral Example:
--> [INST] hello [/INST] how can i help you [INST] .... [/INST] ....

Sign up or log in to comment