Any plans for a superhot mix?

#1
by Delcos - opened

Any plans for this model similar to TheBloke and giving the model 8k context length?

Potentially but i'd need 8K context trained models using the same scaling for both origin models.
I also did not want to dilute it using an upstream model that was already merged with Superhot, so it would have to be finetunes done from scratch at 8K.

Sign up or log in to comment