Creating a RetNet (Retentive Network) version is planned?
#16
by
guyko81
- opened
This model is great, but would love to see it in the Retentive architecture (https://arxiv.org/pdf/2307.08621.pdf) - the promise you guys at Microsoft made is that Retention is at least as good as Attention (up to 6.7B parameters is promised with trained on 100B tokens), but inference is faster. I'm eager to see how fast that could be!
And here is a π€ for the model, loving it!
gugarosa
changed discussion status to
closed