Fix sorting heuristic
#3
by
Markus28
- opened
We saw issues where models instantiated via AutoModel
performed poorly on MTEB. During evaluation we saw that most embeddings produced by this model matched those of a working model, with few exceptions in the batches. This appears to be blamed by mixing sorted
and np.argsort
, which probably use different methods of taking ties when the input contains duplicate. As a consequence, sentences that have a unique length in their batch are embedded properly, but ones with non-unique length may be swapped. I fixed this issue.
Closing in favor of Github PR
Markus28
changed pull request status to
closed