hkunlp
/

instructor-base

Model card Files Files and versions Community

display benchmark metrics for base sized model

by MoritzLaurer HF staff - opened Dec 21, 2022

Dec 21, 2022

Really great paper and fascinating to see instruction finetuning also work for dense embedders. If I understand correctly, table 2 in the paper only displays the metrics for the two large models. can you also display the same metrics for this base sized model? table 2 only compares the large 335M instructor model with other smaller 111M models, which seems like an unfair comparison. (Adding compute times for generating embeddings would also be interesting to assess practical utility)

multi-train

NLP Group of The University of Hong Kong org Dec 21, 2022

Hi, Thanks a lot for your interests and comments on our paper!

In table 2, we also provide metrics for large models, e.g., Sent-T5-XXL (4.8B), GTR-XXL (4.8B), etc, and our INSTRUCTOR (335M) outperforms both of them in terms of the average score across 9 categories (last column).

In the next version, we will provide more results for the base sized model and compute times for generating embeddings!

MoritzLaurer

Dec 21, 2022

•

edited Dec 21, 2022

great to hear that you will add results on the base-sized model and compute times in the next paper version. Model size (and ensuing memory requirements) and embedding speed is quite important for large corpora and directly comparable metrics of roughly 111M sized-models would be great. in your paper you write that instructor-base has 0.1B params, which seems much more comparable to the 111M sized-models than instrucutor-large with 335M params. Otherwise it feels a bit like you are comparing your own large apples with other people's small oranges.

multi-train

NLP Group of The University of Hong Kong org Dec 21, 2022

Hi, thanks a lot for your constructive feedbacks!

We will have more discussions about efficiency (in both memory and embedding speed) in our next version. For your reference, it takes about 90 seconds for INSTRUCTOR-base to encode 40,000 documents in the natural question dataset on single 40GB A100 GPU.

In table 2, our focus is to directly compare INSTRUCTOR(335M/1.5B) to GTR-Large/GTR-XL, as they have the same model size and architecture. In this case, I think they are fair comparison. In the next version, we will add more comparison on the base-sized models.

racinmat

Feb 4, 2024

•

edited Feb 4, 2024

Hi, may I ask what is the precise size of the Instructor base? I read the paper, but I did not find it there, only the large and xl sizes were mentioned there explicitly

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment