display benchmark metrics for base sized model

#3
by MoritzLaurer HF staff - opened

Really great paper and fascinating to see instruction finetuning also work for dense embedders. If I understand correctly, table 2 in the paper only displays the metrics for the two large models. can you also display the same metrics for this base sized model? table 2 only compares the large 335M instructor model with other smaller 111M models, which seems like an unfair comparison. (Adding compute times for generating embeddings would also be interesting to assess practical utility)

NLP Group of The University of Hong Kong org

Hi, Thanks a lot for your interests and comments on our paper!

In table 2, we also provide metrics for large models, e.g., Sent-T5-XXL (4.8B), GTR-XXL (4.8B), etc, and our INSTRUCTOR (335M) outperforms both of them in terms of the average score across 9 categories (last column).

In the next version, we will provide more results for the base sized model and compute times for generating embeddings!

great to hear that you will add results on the base-sized model and compute times in the next paper version. Model size (and ensuing memory requirements) and embedding speed is quite important for large corpora and directly comparable metrics of roughly 111M sized-models would be great. in your paper you write that instructor-base has 0.1B params, which seems much more comparable to the 111M sized-models than instrucutor-large with 335M params. Otherwise it feels a bit like you are comparing your own large apples with other people's small oranges.

NLP Group of The University of Hong Kong org

Hi, thanks a lot for your constructive feedbacks!

We will have more discussions about efficiency (in both memory and embedding speed) in our next version. For your reference, it takes about 90 seconds for INSTRUCTOR-base to encode 40,000 documents in the natural question dataset on single 40GB A100 GPU.

In table 2, our focus is to directly compare INSTRUCTOR(335M/1.5B) to GTR-Large/GTR-XL, as they have the same model size and architecture. In this case, I think they are fair comparison. In the next version, we will add more comparison on the base-sized models.

Hi, may I ask what is the precise size of the Instructor base? I read the paper, but I did not find it there, only the large and xl sizes were mentioned there explicitly

Sign up or log in to comment