float32 vs bf16

by janimo - opened Apr 11

Discussion

janimo

Apr 11

Why the difference between this and the -it model dtypes?

osanseviero

Google org Apr 11

Full precision is usually useful for pre-training. For inference, using bfloat16 should be good :)

janimo

Apr 12

Other models (even regular gemma) have bf16 both for base and it models, hence my question about what is the rationale here. Is f32 needed for proper recurrent gemma fine-tuning?

AnushanF

Google org Apr 18

f32 is not needed to do fine-tuning, either f32 or bf16 will be possible to do fine-tuning with.

sohamde

Google org Apr 18

Just to add to the comment of @AnushanF , in the code we always do the recurrence (RG-LRU layer) in f32 even when fine-tuning the overall model with bf16, as we found this to work much better.

lkv

Google org Oct 17

Hi @janimo ,@sohamde, Hope this issue is resolved. Please close the issue and feel free to reopen if any further issue arise. Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment