what is the input token length of Falcon-40B and -7B models?

#38
by sermolin - opened

couldn't find it in the documentation, reference notebook hardcodes it to 1024 mentioning the need to set int8 if the input length is >1024, but what's the max?
use-case: document summarization and text generation. Probably would not want to use --Instruct model for that, right?

Someone for That one ?

Tried to increase the number of tokens in the openapi.json (cloned the repo and found that simply by searching for 1024) but that didn't help. Created a feature request: https://github.com/huggingface/text-generation-inference/issues/593. Please add to that if you need any adaptations.

Sign up or log in to comment