Colbert Mode Usage

#41
by pulkitchahar - opened

I wanted to store the colbert embeddings for faster reranking of retrieval based on dense vec. But considering that if a document have 1024 tokens on average(trunc if more), I will have 1024*1024 matrix, the size of which if i use fp16 will be 2MB. That sounds huge, especially when I think about scaling up. Am I doing this right, or am I missing something? Are there any ways to decrease the size but still keep the performance similar to original.

i'm also interested

Sign up or log in to comment