Fine-Tuning Strategies: Choosing Between microsoft/mpnet-base and sentence-transformers/all-MiniLM-L6-v2
Hi everyone,
I’m looking for recommendations on which model to fine-tune for a similarity task. I have a dataset of around 5,000 samples. According to the Sentence Transformers documentation, there are various base models to choose from, including microsoft/mpnet-base (https://sbert.net/docs/sentence_transformer/training_overview.html#best-base-embedding-models).
I am planning to use all-mpnet-base-v2 for fine-tuning, which is not mentioned in the documentation. I believe that using all-mpnet-base-v2 may be better due to its prior fine-tuning compared to microsoft/mpnet-base. Is this approach correct? Additionally, any insights on pooling strategies or general tips for fine-tuning would be greatly appreciated!
Hello!
Apologies for the delay. Yes, it often makes sense to finetune from an "already finetuned" Sentence Transformer rather than finetune from a "transformer" model, especially if you don't have too much data.
As for tips: Personally, I think this docs page is quite useful: https://sbert.net/docs/sentence_transformer/loss_overview.html, and I think you'll do well to look at the "Commonly used Loss Functions" to get an idea.
And for pooling strategies: mean and CLS are most commonly used, miles ahead of all others. I believe there's not currently a consensus which one is best, but if you continue finetuning an existing Sentence Transformer then I would keep with whatever they chose (i.e. you can just load that model and finetune it with the Sentence Transformers Trainer)
- Tom Aarsen
Hi,
Thank you for the response. I have trained models with microsoft/mpnet-base and sentence-transformers/all-MiniLM-L6-v2. Sentence-transformers/all-MiniLM-L6-v2 outperformed.
Thanks for sharing! Well done :)