Use correct `gelu` function
#5
by
ybelkada
- opened
related discussion: https://huggingface.co/google/flan-t5-xxl/discussions/11
The previous config file was using gelu
function instead of gated-gelu
that is automatically set when forcing is_gated_act
to True
, more specifically here
This is not a breaking change since it fixes only for inference. Users that trained a model with gelu
instead of gated-gelu
should not be affected by this change. Note that using gated-gelu
instead of gelu
can give slightly different qualitative results but does not affect the overall performance of the model.
ybelkada
changed pull request status to
merged