SFT Results So Bad

#41
by GokhanAI - opened

Hello,
My question is, I am applying a model SFT to chat data. I do this with the LLama-7b&13b-Chat-hf and Mistral-Instruction_V_2.0 models.

But my question is, should we make this type of chat format SFT or should we apply DPO or RHLF directly?

I apply LORA (for SFT).

I observed that when I perform SFT with Lllama, the model loses its structure slightly, but it can also answer other questions other than data.

But when I do this on Mistral, it loses its entire structure and cannot answer any questions other than data.

What's the problem with this? What path should we follow? Isn't SFT suitable?
Do we need to do DPO or RHLF directly? Topics I really don't understand. Where am I doing wrong?

Also;
I am doing these operations by going over ready-made coding such as SFTrainer and DeepSpeedChat, but what should be the difference? How can I continue without destroying the structure?

Caution: You can pay attention to BOS and EOS.

The format is the same as this example;

Llama Example:
-- > [INST] hello [/INST] how can i help you [INST] .... [/INST] ....

Mistral Example:
--> [INST] hello [/INST] how can i help you [INST] .... [/INST] ....

I also used SFT with custom chat dataset. I got the similar problem. The output of the model are all over the places. Sometimes fine tuned model generates the same input. Did you able to fix your problem? I guess tokenization have a big role.

Same story here ...

Sign up or log in to comment