Long context. Running on multiple GPUs

#49
by averoo - opened

Hello!

Please, advice me and community on how to run this model in a distributed manner?

I want to put a long context (around 40k tokens) and the model tries to allocate too much memory (around 150 Gb GPU RAM). I have this amount of memory on several cards.

I'd check out this github discussion where they talk about this exact issue. Hopefully it will help.

Cheers!

Sign up or log in to comment