Fine-tuning Alibaba-NLP/gte-Qwen2-7B-instruct for Domain-Specific Retrieval with Query, Positive, and Hard Negatives
Hi,
I am exploring the possibility of fine-tuning the Alibaba-NLP/gte-Qwen2-7B-instruct model for a domain-specific retrieval task in spanish using a dataset formatted as follows:
Query: A single text input representing the search query.
Positive examples: A list of documents relevant to the query.
Hard negatives: A list of documents contextually similar to the query but explicitly non-relevant.
Could you provide some examples or recommendations for configuring the model to handle this structure effectively? Additionally:
Are there specific pre-processing steps required to handle Spanish text or domain-specific terminology?
Does the model have any inherent support for Spanish, or are there additional considerations when working with non-English datasets?
Are there examples or guidelines available for fine-tuning the model on a retrieval task with this format?
I would greatly appreciate any insights, examples, or resources that could help in this process.
Hello,
is there any fine-tuning script for this model? It would be interesting to tune this model for downstream tasks.
Thanks !