Low context length ruins a model.

#1
by rombodawg - opened

I dont know how the rest of the community feels but I basically dont bother with a model if it has less than 4k context length. Hell I was pissed off that codellama-70b only have 4k context length. 8k is really the standard now. So great if you made something good here, but I wont be using it in any capacity.

Allen Institute for AI org

The feedback is well taken, but we do use RoPE embeddings, which don't actually constrain the length of the input you give. We just didn't train that way. Try it and report back?

I see you mentioned inputs, however the outputs are equally as important to me. In coding and writing especially, models are expected to output more than 4k tokens of content often. forgive me if im mistaken as im not super familier with how RoPe works, but my point still stands if the model cant output more than 2k tokens. thanks for your quick response btw

Allen Institute for AI org

There is no limit on the output either, for the same technical reason. It will just keep outputting.

FWIW, our pretraining corpus contains only about 15% code. So I would not expect stellar coding quality from it, at least not on a level with models that are trained purely on code. But there is enough there that we (or you!) could finetune a lot more coding ability into it.

Well i really appreciate all the information. In the last 40 minutes ive dont some research and talked to the community a bit about RoPe, and the consensus is that although its a usable form of extending context. Its far from reliable in terms of preserving the percision of the model. And while i agree that this model might not be the best suites for coding, neither was mistral-base at release, but many finetunes later and some quite good coding 7b models have come out of it (for its size). However that model has 32k context so its far more useful as an overall model.

I suppose we will have to see what the community brings with their fine tuned models. Thats more of what im interested in when it comes to OLMo and its overall performance as a model.

natolambert changed discussion status to closed

Sign up or log in to comment