Finetuning for assymetric semantic similarity?

#91
by shensmobile - opened

I know there are a lot of finetuning questions here, but I want to throw one more into the pile.

I currently use mxbai to do an assymetric semantic simlarity. However, since mxbai has a 512 token limit, I often find myself cutting off valuable data.

My current code loads mxbai, sets the loss to OnlineContrastiveLoss, and then shoves the model, loss, distance metric (Cosine distance), and data into SentenceTransformerTrainer.

If I want to drop in jina-embeddings-v3 in place for mxbai, which task do I need to use? Since OnlineContrastiveLoss compares two pieces of text (in my case, a long document and a shorter summary), I'm not sure if I can train retrieval.query or retrieval.passage without changing my code to train each side of them individually. Can I use text-matching and train in the fact that the data will be assymetric?

Jina AI org

Hi @shensmobile , text-matching should work fine. Implementing 2-adapter tuning would require some effort and I'm not sure if it would bring significant benefits for your case.

I assume that the LoRA adapter was only trained on symmetric sequences, but I have enough sequences that I think I can make it work out. I can still use the rest of my mxbai training code despite text-matching being a LoRA adapter right? And would there be any benefit to just train on the full parameters of the model?

Edit:
Running into this problem during training:

RuntimeError: Index put requires the source and destination dtypes match, got BFloat16 for the destination and Float for the source.

My data is in the format of "Sentence1", "Sentence2", "Label" (which is either int 1 or 0 for contrastive loss). Does this format and loss function work for this LoRA adapter? Or is the issue that the model is somehow stuck in BF16 when I need to train it in fp16?

Example for Fine-Tuning Models on Asymmetric Semantic Similarity with Hard Negative.

I’m currently working on a project involving asymmetric semantic similarity and I’d like to fine-tune a model using hard negative sentences. I’m looking for a practical example to understand the process better.

Could you please provide an example or reference on:

Preparing the dataset for fine-tuning, including how to structure pairs with hard negatives for asymmetric tasks?
Implementing the fine-tuning process using Hugging Face tools (e.g., transformers or datasets)?
Adapting loss functions like contrastive loss or triplet loss for this specific use case?
A step-by-step example or notebook reference would be extremely helpful!

Thank you so much for your guidance.

Sign up or log in to comment