Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ashercn97 
posted an update Mar 10
Post
3194
does anyone know what the SOTA in text embedding is? Specifically for like sentence similarity and clustering?

I think that the MTEB leaderboard is super complex. I feel lost looking at it (what metric should I judge by?)

I would say, sort by "Mean (task)" and pick one of those. Or if you can, compare three of the best on your data. That holds unless you need a longer context, or you are in medical or similar field where there are domain-specific models

·

Oh wait this makes sense.

I have created some benchmarks from user data-- maybe i make my own leaderboard haha.

Thanks for the help!

·

Yes ive seen! Thank you. My issue is the 100 requests a day..

I think it is NV-Embed-v2, with a score of 72.31 on MTEB

·

Oh this is good 2 know!