Questions for training

by alibidaran - opened Jan 1

Discussion

alibidaran

Jan 1

Did you train your tokenizer for Persian corpus?
How many train steps do yo preserve for training llama?

mostafaamiri

Owner Jan 3

Yes, the steps taken to train the model include:
1- Tokenizer training on persina corpus
2- Training LoRA model adapter on persian corpus
3- Instruct tuning on translated ALPACA and some similar corpus

alibidaran

Jan 3

I have the ALPACA Persian-style dataset, but I don't know how many training steps are required for training LLAMA2. I trained LLAMA2 for various English datasets, but Farsi even with a trained tokenizer, doesn't give me a considerable result.

mostafaamiri

Owner Jan 4

Do you train adapter on Farsi dataset.
I trained it on 200 milions token.

mostafaamiri changed discussion status to closed Apr 6

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment