Float32 training

#1
by Iker - opened

Hello!

First of all, thank you for your work; I find it very interesting and useful!

While reading through the documentation, I found that you have trained the model using float32:

We use DeepSpeed with full float32 training.

I find this surprising, as most models are trained with bfloat16. I was curious about why you made this decision. Was the model performance better with float32 than bfloat16?

Projecte Aina org

Hello!

Thank you for your interest in our model.

At the beginning of the project, we observed some training instability due to deepspeed and fp16. As fp32 didn't cause any memory bottlenecks in our configuration, we decided to train it this way.

We continued working on the problems, and they are solved now. Future models will be trained using fp16 instead.

Sign up or log in to comment