WHY "max_position_embeddings": 2048

#3
by chenxi118 - opened

Typically, models based on LLaMA-2 have a parameter size of 4K, but why is it 2K here? Will this lead to a shorter effective understanding of the context by the model?

Tiger Research org

this is because we fine tuned this version of model using 2048 max length to group data, we found almost all demonstration data within this length. However, the model should work fine with 4k length or even longer, RoPE can extrapolate well due to its functional form.

chenxi118 changed discussion status to closed

Sign up or log in to comment