Shorter context window to reduct inference memory allocation

#31

by JochenGrey - opened Jun 1, 2024

Jun 1, 2024

Is it possible to shorten the context length to e.g. 50k to limit the amount of memory being used during inference?
Would rope scaling factors need to be adjusted in case of shorter inference context?

mikestaub

Jun 7, 2024

Perhaps a larger context is needed to reduce inference time?

brecker

Jul 19, 2024

@mikestaub why would larger context reduce inference time? If you fill say 3k of context with real tokens, doesnt the balance ~125k get filled with padding?
Is there anyway to reduce the context length? For example mimicking as if phi-3-vision had been Clip + Phi3-4k-instruct (rather than the 128k?)
@JochenGrey - Any idea how to reduce the context length?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment