Please reply to this

by aslawliet - opened

What hyperparameters did you use, specially the learning rate, also what and how many TRC TPUs did you use for the training? Was it a full parameter training or lora training?


You can find learning rate in the README. FYI, 5e-5 and reduced during training until 1e-5.
I used TPUv4-64/256 for pretraining, and of course this is full param training.

beomi changed discussion status to closed

Sign up or log in to comment