have a question

#1
by LittenBuzz - opened

Hi. Could you please help me figure out the Context length for this model?

In config.json, I believe this is the max length:

  "max_position_embeddings": 4096,

There are variants of the Solar 10B models that have trained on longer contexts, though no idea how good they are.

I find Solar models in general to be still pretty decent when context is extended from 4k to 8k with alpha_value=2.5 or to 16k with alpha_value=4.
And 10.7B exl2 in 6bpw can be loaded with about 9.5GB VRAM with 16k context when 4bit cache is enabled.
But there are indeed some big context models, for example SOLAR-10.7B-Instruct-v1.0-128k, though I have no idea how much its quality drops on long context.
NousResearch also has a few extended 32k/64k models but they seem to only make these for base models and not instruct ones, so it might be tricky to get use of these.

Sign up or log in to comment