Weights are in FP16 (loaded in FP32) but paper mentions BF16

#17
by AdrienC - opened

The paper mentions that the training was done in bf16 (as one would expect with a Mistral model) however the safetensors files are float16 and the config.json loads the weights in float32. I would expect that saving the weights in FP16 could lead to overflows coming from BF16.

Could you give us more details on how to load and potentially fine-tune this model without running into issues ?

Sign up or log in to comment