No SWA ?
I think this still suffers from sequences that have higher length than 512. However, mistral should be solving this by using SWA. How to tackle this issue ? or are there any similar or lightweight model for this ?
@byunal hmm no it has a much higher context length then 512. Infact all llama, qwen, mistral models have higher context then 2048?
I think you are using something like ctransformers or llama cpp python which sets the context limit as 512, you have to change it to your desired length.
@YaTharThShaRma999 Actually yes. I'm trying to use this model for text summarization on CPU over ctransformers. Currently, I have no access to any GPU so I have to do inference on CPU. Frankly, I didn't know that ctansformers limits the context length. How can I neglect this limit on CPU ? I'd appreciated if you can help.
I have the same issue, where ctransformers is limiting it to 512. I thought it was something wrong on the model configuration.
Were you able to solve this issue?
EDIT: https://discuss.huggingface.co/t/number-of-tokens-2331-exceeded-maximum-context-length-512-error-even-when-model-supports-8k-context-length/57180/6
I just used the argument in the function as suggested in the link above.
@esuriddick Didn't solved and I'm done dealing with, but thanks .