float32 vs bf16

#5
by janimo - opened

Why the difference between this and the -it model dtypes?

Full precision is usually useful for pre-training. For inference, using bfloat16 should be good :)

Other models (even regular gemma) have bf16 both for base and it models, hence my question about what is the rationale here. Is f32 needed for proper recurrent gemma fine-tuning?

Google org

f32 is not needed to do fine-tuning, either f32 or bf16 will be possible to do fine-tuning with.

Google org

Just to add to the comment of @AnushanF , in the code we always do the recurrence (RG-LRU layer) in f32 even when fine-tuning the overall model with bf16, as we found this to work much better.

Sign up or log in to comment