Convergence with 16\8 bit

by NeuroScie - opened

Hi, great work!
In your example you talk about additional parameters for fitting the training into smaller gpu's (simmilar to the huggingface fill50k example).
Can you verify that it actually converged for you using 16 bit? and if so, can you provide info regarding how many steps did it take? Any additional parameters?


Sign up or log in to comment