BAAI/bge-m3 · Using bge-m3 for clustering and search

Hi,

I have an idea to use the bge-m3 model for clustering and search. The main idea is that Each cluster will be represented by three centroids, one for each vector type (dense, sparse, and colbert). When searching for a cluster, calculate the score between the centroids and the 3 vector to identify the most suitable cluster.

For the dense vector I can simply calculate the mean of all the dense vectors. But I'm not sure what's the right approach for the sparse and colbert vectors.

Would love to hear your thoughts about it and If it's something passible..