Shorter context window to reduct inference memory allocation

#31
by JochenGrey - opened

Is it possible to shorten the context length to e.g. 50k to limit the amount of memory being used during inference?
Would rope scaling factors need to be adjusted in case of shorter inference context?

Perhaps a larger context is needed to reduce inference time?

Sign up or log in to comment