Finetuning Genna for Foreign Language

#78
by user1357925 - opened

I am attempting to fine-tune Gemma for one of the languages on which it has been pretrained. Could you provide any suggestions regarding the optimal size of the dataset to ensure a noticeable improvement in performance? The best format for the training files? Any other recommendations? Thank you.

This comment has been hidden

@user1357925

hello friend. I had a good response from the gemma 2b model using this format to pass to the dataset. I did the Fine tuning for Brazilian Portuguese. He follows
I have 2 datasets. One for mental 36k (gemma 2b )
another with 100k for instruct ( gemma 7b )

def formatting_func(example):
     instruction = example['question']
     output = example['answer']
     text = f"<start_of_turn>user\n{instruction}<end_of_turn> <start_of_turn>model\n{output}<end_of_turn>"
     return text

Thank you. Would you mind sharing the whole script you used for fine-tuning?

@user1357925
Yeah, sure. Could you send me a email?
rhaymisoncristian@gmail.com or call me on linkedIn and i will share the notebook with you.

Sign up or log in to comment