`gelu` function

#2
by ArthurParkerhouse - opened

Hello, I wasn't sure if this change would also affect your model since it also uses T5-Large, but I happened to see something about the "gelu" function in the config.json needing to be changed from ( "feed_forward_proj": "gelu",) to ( "feed_forward_proj": "gated-gelu", ) and wanted to pass it along incase it's something of interest to you.

I'd seen the discussion at these links:
https://github.com/huggingface/transformers/issues/20250
https://huggingface.co/google/flan-t5-large/discussions/5#6374fd05627b7ad976cf0a17

Thanks for the heads up! If I'm reading the discussion correctly, inference should be mostly ok?

closing this as I think we are good
image.png

pszemraj changed discussion status to closed

Sign up or log in to comment