Creating a RetNet (Retentive Network) version is planned?

#16
by guyko81 - opened

This model is great, but would love to see it in the Retentive architecture (https://arxiv.org/pdf/2307.08621.pdf) - the promise you guys at Microsoft made is that Retention is at least as good as Attention (up to 6.7B parameters is promised with trained on 100B tokens), but inference is faster. I'm eager to see how fast that could be!
And here is a πŸ€— for the model, loving it!

Microsoft org

Hello @guyko81 !

I wasn't aware of this paper, but I am glad that you shared. Will be taking a closer look on it, thanks!

gugarosa changed discussion status to closed

Sign up or log in to comment