Can this run on FLAN t5?

#5
by ljhwild - opened

I'm just reading the paper and it appears long t5 runs on t5 and not on flan t5.
Is there any reason why?

Google org

Hello! Both t5 and flan-t5 have the same model architecture. You can see in flan-t5's model card that it is using the t5 architecture under the hood: https://huggingface.co/google/flan-t5-xxl/blob/main/config.json#L3

However, long-t5 has a slightly different architecture to enable it to scale to longer sequences.

Hope that helps!

Sign up or log in to comment