e5-R-mistral-7b for retrieval, apply for refreshing the results

#132
by BeastyZ - opened

Hi @tomaarsen @Muennighoff ,

We submitted a new model, BeastyZ/e5-R-mistral-7b. Could you please refresh the space?

Thanks!
Beasty

BeastyZ changed discussion status to closed
Massive Text Embedding Benchmark org

Seems like it already shows up likely via the automatic refresh. Curious why the result parsing did not work though - was it because of the (default)? Did that get added by the script in the MTEB repo?

Yes, (default) was added by the script in the MTEB repo. I have now manually deleted (default) and am waiting for the next automatic refresh.

Massive Text Embedding Benchmark org

Oh, that seems like a bug as it should work out of the box - ill need to double check - cc @KennethEnevoldsen in case you know; seems related to changes in the meta script

Massive Text Embedding Benchmark org

@BeastyZ did you create the meta data using the CLI mteb create_meta ...? If so it should work (otherwise we have a bug to fix)

@BeastyZ did you create the meta data using the CLI mteb create_meta ...? If so it should work (otherwise we have a bug to fix)

Yes, I did. I followed the guideline here to submit my results.

Massive Text Embedding Benchmark org

Hmm right, looking at the code it also seems like it is an error on our end. @Muennighoff we should probably allow for "(default)" for consistency with the other subsets. WDYT?

Massive Text Embedding Benchmark org

we should probably allow for "(default)" for consistency with the other subsets.

It seems like we can either

(a) Change the leaderboard code to allow default. The problem here is that we do not want (default) to appear in the leaderboard table I think as it is not very useful, but we want languages to appear. So we would have to manually replace it somewhere in the code. Probably adds a line or two to the LB code here: https://github.com/embeddings-benchmark/leaderboard/blob/bef8d2ff6b420db179018d2a2689207aad180449/refresh.py#L325. The question is do we want it to appear in the Evaluation results sidebar of models? It also seems not super useful there so maybe no need but then this solution would not be desirable.
(b) Change the mteb code not to add default to the name like here. Adds one line here: https://github.com/embeddings-benchmark/mteb/blob/778d7a3bf85b2023cc8ba9b2c35a810dcfa5e924/mteb/cli.py#L298. This is how it has worked thus far.

I don't have a strong preference but given that default is not very useful info in the sidebar/metadata (note that it is still recorded in the config field just not shown in the name) & its how it has worked thus far, I'd go with (b). But happy to be disagreed with! :)

I manually deleted the default 24 hours ago, but my model, e5-R-mistral-7b, still hasn't appeared on the retrieval leaderboard. Why is that?

Massive Text Embedding Benchmark org

The latest refresh failed: https://github.com/embeddings-benchmark/leaderboard/actions/runs/9884681390
Apologies while we work out the kinks of the new automatic refresh.

cc @KennethEnevoldsen @orionweller this is regarding the PawsXPairClassification (fr) key not being found.

  • Tom Aarsen
Massive Text Embedding Benchmark org

Yes, sorry about this @BeastyZ ! Pushing a fix now

Massive Text Embedding Benchmark org

EDIT: auto-refresh is working again now and I added a status check before other PRs. The results are still empty though - I assume this is an issue with the default conversation above.

image.png

Massive Text Embedding Benchmark org

Making an issue on the leaderboard Github to consolidate this issue: https://github.com/embeddings-benchmark/leaderboard/issues/8

@KennethEnevoldsen @Muennighoff @orionweller @tomaarsen
Thank you for your timely and kind help! Things are moving in a positive direction. I only want to add my model into the retrieval leaderboard. Many scores appear, but Average and CQADupstackRetrieval are not among them.

image.png

BeastyZ changed discussion status to open
Massive Text Embedding Benchmark org

Hey @BeastyZ ! In that Github issue I referenced earlier I pointed this out and tagged you there (I thought, perhaps I got the wrong Github handle). I agree it's an issue!

Is it okay if we move the discussion there? We're trying to move away from using the Spaces for PRs/discussion.

Massive Text Embedding Benchmark org

FWIW @BeastyZ the issue is you don't seem to have main_score for MTEB CQADupstackRetrieval. I think you need to aggregate them.

Massive Text Embedding Benchmark org

Will close this issue again and refer to to https://github.com/embeddings-benchmark/leaderboard/issues/8

KennethEnevoldsen changed discussion status to closed

Sign up or log in to comment