is this model the instruct version
Hi, i checked the tokenizer and found both gemma-7b-bnb-4bit
and gemma-7b-it-bnb-4bit
share the same tokenizer
. Are both models fine-tuned instruct version?
@shi-zheng-qxhs
Oh no the it
is the instruct one. I manually edited the tokenizer to expose the tokens for <start_of_turn>
and <end_of_turn>
. Interestingly both the instruct and base models have these tokens
Thanks, just wanted to make sure. :)
Just a few follow-up questions:
- is there any specific reason the
padding_side
is set toright
? - Can I use
unsloth
to perform custom training, i.e., without using any oftrainer
class, but with pytorch native training loop, for example.
Thanks!!!
-
@shi-zheng-qxhs
padding_side = "right"
is for training purposes only. Change it to"left"
for inference. - Yes it should work!
fine tune and inference flash attn:
padding_side= "left"
no?
Could you explain well the (padding_side = "right")
@NickyNicky You can use padding side left, however it makes things slower for training. I don't advise it. Unsloth itself must require right padding for training.
Yes simply after training, set tokenizer.padding_side = "left"
before model.generate
but wouldn't it trigger alerts from flash attn padding_side = "right"?
now I'm confused haha
@NickyNicky Oh if you're simply using HF, just use whatever they provide. Unsloth itself uses right padding
or interesting I see that after the merge the library deletes the 'padding_side', but when the weights are saved "lora" there is 'padding_side'
example merge:
https://huggingface.co/NickyNicky/gemma-1.1-2b-it_text_to_sql_format_chatML_V1/blob/main/tokenizer_config.json
fine tune unsloth peft
https://huggingface.co/NickyNicky/gemma-1.1-2b-it_text_to_sql_format_chatML_peft_V1/blob/main/tokenizer_config.json
So after training it's time to add padding_side