Creating a RetNet (Retentive Network) version is planned?

#16

by guyko81 - opened Sep 14, 2023

Sep 14, 2023

This model is great, but would love to see it in the Retentive architecture (https://arxiv.org/pdf/2307.08621.pdf) - the promise you guys at Microsoft made is that Retention is at least as good as Attention (up to 6.7B parameters is promised with trained on 100B tokens), but inference is faster. I'm eager to see how fast that could be!
And here is a 🤗 for the model, loving it!

gugarosa

Microsoft org Sep 26, 2023

Hello @guyko81 !

I wasn't aware of this paper, but I am glad that you shared. Will be taking a closer look on it, thanks!

gugarosa changed discussion status to closed Jan 8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment