Long Context Sucessor?

#17
by brucethemoose - opened

Zephyr Alpha/Beta seems excellent, with a major exception. It doesn't handle long context as well as Amazon's MistraLite 32K model:

https://huggingface.co/amazon/MistralLite

Are there any plan's to adopt some of Amazon's tricks, such as the very large rope_theta, the 16K sliding window, and the 16K training? Whatever they did seems to work extremely well, better than other long context Llama finetunes/Loras I've tried.

Sign up or log in to comment