Activation function config inconsistent

#11
by michaelroyzen - opened

The config.json says that the activation function is 'gelu' and yet 'is_gated_act' is set to true. Shouldn't the activation function be 'gated-gelu' like the rest of T5v1.1-style models? Or if that's not the case, shouldn't 'is_gated_act' be set to false?

Given that this model is T5v1.1-initialized (as per https://github.com/google-research/t5x/blob/main/docs/models.md#t5-11-lm-adapted-checkpoints), shouldn't the config reflect that the activation is 'gated-gelu'?

The solution is simple: "feed_forward_proj" should be "gated-gelu" and "dense_act_fn" is redundant and should be removed entirely from the config.

hi @michaelroyzen
thanks for raising this. Let me get back to you asap

Hi @michaelroyzen
We have updated the config files accordingly. Thanks for raising the issue

ybelkada changed discussion status to closed

Sign up or log in to comment