Convergence with 16\8 bit

#10
by NeuroScie - opened

Hi, great work!
In your example you talk about additional parameters for fitting the training into smaller gpu's (simmilar to the huggingface fill50k example).
Can you verify that it actually converged for you using 16 bit? and if so, can you provide info regarding how many steps did it take? Any additional parameters?

Thanks

Sign up or log in to comment