No SWA ?

by byunal - opened Jan 17, 2024

Jan 17, 2024

I think this still suffers from sequences that have higher length than 512. However, mistral should be solving this by using SWA. How to tackle this issue ? or are there any similar or lightweight model for this ?

YaTharThShaRma999

Jan 17, 2024

@byunal hmm no it has a much higher context length then 512. Infact all llama, qwen, mistral models have higher context then 2048?

I think you are using something like ctransformers or llama cpp python which sets the context limit as 512, you have to change it to your desired length.

byunal

Jan 18, 2024

•

edited Jan 18, 2024

@YaTharThShaRma999 Actually yes. I'm trying to use this model for text summarization on CPU over ctransformers. Currently, I have no access to any GPU so I have to do inference on CPU. Frankly, I didn't know that ctansformers limits the context length. How can I neglect this limit on CPU ? I'd appreciated if you can help.

esuriddick

Feb 7, 2024

•

edited Feb 7, 2024

I have the same issue, where ctransformers is limiting it to 512. I thought it was something wrong on the model configuration.
Were you able to solve this issue?

EDIT: https://discuss.huggingface.co/t/number-of-tokens-2331-exceeded-maximum-context-length-512-error-even-when-model-supports-8k-context-length/57180/6
I just used the argument in the function as suggested in the link above.

byunal

Feb 20, 2024

@esuriddick Didn't solved and I'm done dealing with, but thanks .

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment