Very nice!

#1
by TheBloke - opened

This is awesome. I actually pinged @kaiokendev recently to ask if he could implement his code as a config.json patch activated by trust_remote_code=True. I didn't see that you'd already done it!

I just wanted to check: this model is not re-trained for context, correct? It's the same as Open Llama 7B base model, but with the ROPE code added so it could be used as a basis for further training with increased context? My understanding is that the code will work without re-training the model, but that responses are much improved by applying increased-context training on top?

Great work, @emozilla and @kaiokendev !

Yup, this is fully inference only -- exactly equivalent model weights. I haven't tried fine-tuning with this (yet) but the fact that it works /at all/ without fine-tuning suggests it should work much better after some extra training

Sign up or log in to comment