Using bge-m3 for clustering and search

#33
by talavivi - opened

Hi,

I have an idea to use the bge-m3 model for clustering and search. The main idea is that Each cluster will be represented by three centroids, one for each vector type (dense, sparse, and colbert). When searching for a cluster, calculate the score between the centroids and the 3 vector to identify the most suitable cluster.

For the dense vector I can simply calculate the mean of all the dense vectors. But I'm not sure what's the right approach for the sparse and colbert vectors.

Would love to hear your thoughts about it and If it's something passible..

Beijing Academy of Artificial Intelligence org

The model will Sparse weight and colbert vector for each token. A possible method is maintaining a token set, and computing the mean value for each token.

Sign up or log in to comment