How much GPU memory is required for 32k context embedding?

#32
by Labmem009 - opened

I tried to use this model to get embedding of long text, but I failed many times with 6*A100 and DP for OOM. Is there any suggestion to allocate memory for long text?

Owner

For 32k context, it needs to run on an 80GB A100 GPU with float16 / bfloat16 and FlashAttention enabled, also the batch size needs to be reduced to 1.

Sign up or log in to comment