For the original 200k context, would it be better to do an ntk patchwith 4k?patch

by Trangle - opened Jan 19

Jan 19

From the short text model expansion, will use a magnification factor greater than 1, here to use a reduction factor less than 1 to 4k around will be better, also does not affect the subsequent expansion, may have a good impact on the original model long text ability?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment