Fine tuning

#3
by gromag - opened

Thank you for sharing this model and paper.
I'm investigating what would take to further fine tune Instructor-XL to a legal domain for retrival tasks.
I'm trying to assess what could be a good starting training set size, loss temperature and what could be a good k of negative pairs per positive pairs.
I welcome any other heads-ups.

PS. with hindsight I feel a little daft asking about finetuning when the model card explicitly say "embeddings tailored to any task and domains [...] by simply providing the task instruction, without any finetuning. " , please let me know if it is a stupid idea.

NLP Group of The University of Hong Kong org

Thank you very much for your interest in INSTRUCTOR!

The instruction serves as an efficient option for adapting embeddings to specific domains, but you can also further enhance the model ability through finetuning. At the start, you may use all the available training data (probably training for a maximum of 40K steps). For other hyper-parameters, you may adopt our default setting (e.g., loss_temperature=0.01, k=4, etc.)

Hope this helps! Feel free to add any further questions or comments!

Sign up or log in to comment