nomic-ai/nomic-embed-text-v1 · Request for Assistance with Fine-Tuning the nomic-embed-text-v1 Model for spanish language

Jun 1

I hope this message finds you well. My name is Wilfredo, and I am currently working on a project that involves fine-tuning the nomic-ai/nomic-embed-text-v1 model for a specific application in Spanish text processing.

I am reaching out to you to request your assistance in understanding the steps required to fine-tune this model effectively. Specifically, I am looking for guidance on:

Dataset Preparation: What are the recommended practices for preparing the dataset for fine-tuning? Are there any specific data formats or preprocessing steps that should be followed?

Fine-Tuning Process: Could you provide detailed instructions or a framework for fine-tuning the model, including any specific hyperparameters or training configurations that are crucial for achieving optimal performance?

Thank you very much for your time and consideration. I look forward to your response.

Best regards,

zpn

Nomic AI org Jun 1

hi sentence transformers 3 might be a good place to start! https://x.com/tomaarsen/status/1795425797408235708

as far as data, i would curate a sizeable dataset of at least 10k to finetune on, although I'm not sure how well the model will do since the tokenizer is optimized solely for english.

wilfoderek

Jun 6

Thank you @zpn . Also, I would like to study the code og nomic embeded.

zpn changed discussion status to closed Jun 6