About T5 and Flan-T5

#3
by longhuizhang - opened

In the README, it is stated that "We use the original t5-3b vs Flan T5-3b on the paper", but in paper, RankT5 should be trained on t5-3b, not Flan-T5.

Hi, indeed I made a mistake while reading the paper (one of several) and just fixed it on the readme. Also added some things I've noticed since (like the fact that the head is not the same and the one I'm using here is not ideal).

We are working to release the inference (already ready internally) and training code (need some refactoring) in the following months and I will try to fix these mistakes to see if we can get even closer to reproducing the numbers. Obviously getting the one from the original authors would be better, but I hope that the current version can help you anyway.

longhuizhang changed discussion status to closed
longhuizhang changed discussion status to open

Sign up or log in to comment