Fine-tuning/Tokenizer

#5
by Mukhamejan - opened

Hello, I have been trying to fine-tune your model. At that time I didn't see a discussion section, so I created my tokenizer on text from that dataset. I have been fine-tuning your model with my tokenizer and at the end I got some German-sounding voice of a girl, lol. I have been wondering, did you convert your text to Latin or not? Because, when I try to tokenize on the default one, I get something like this.
image.png

So as I understand, you converted it into Latin. Can you please share the tool you used for converting it to latin, please.

Owner

Hi, Mukhamejan! No, I didn't convert the text to latin.

Sign up or log in to comment