Context Length of the model 512 or 2048?

#12
by maveriq - opened

Hi. Thank you for such an interesting work.

In the footnote of page 2 of the paper, it is mentioned that the context length of the model is 512. However, when I look at the model config on HF, it says max_position_embeddings=2048. Am I missing something?

The max position embeddings is indeed 2048, but during training it only saw examples with context length up to 512, so its positional encodings beyond that length basically have random values, which is why it's likely that the quality of completion will very significantly decrease beyond a length of 512 (that wouldn't be the case with rotary embeddings, but we just have a trained positional embedding).

Thank you for clarification

maveriq changed discussion status to closed

Sign up or log in to comment