related discussion: https://huggingface.co/google/flan-t5-xxl/discussions/11
The previous config file was using
gelu function instead of
gated-gelu that is automatically set when forcing
True, more specifically here
This is not a breaking change since it fixes only for inference. Users that trained a model with
gelu instead of
gated-gelu should not be affected by this change.