Work on a paper

#2
by emozilla - opened

This insight is just great -- best kind of optimization (novel but intuitively understandable in retrospect). I've done some additional work on it and was wondering if you'd want to colab on publishing a short paper. You can dm me @theemozilla on twitter (also my discord username). I wouldn't want to publish anything without you as an author. Lemme know!

@emozilla Hello, I do not use Twitter lol. You can email me at kaiokendev1@gmail.com

Been looking at the work here and the associated blog post, as well as the work here, https://huggingface.co/emozilla/open_llama_7b-scaled.

The idea makes sense to me, but in testing the open_llama_7b-scaled, I get poor results when I increase the context window.

Does the model and method require further fine-tuning? With the openllama, I did not further tune the model.

This comment has been hidden

Sign up or log in to comment