Training datasets used for Danish, Swedish and Spanish languages
Hello,
I am running some evaluation tests on this model for the languages mentioned in the title. The initial results are really promising. I would like to know more. Can someone provide details on the datasets used for training this model for the mentioned languages?
Hello!
I believe this model was finetuned with English data on top of a multilingual base model, I think https://huggingface.co/FacebookAI/xlm-roberta-base. In our experience, this is a rather capable method of getting a multilingual embedding model.
Some other models, e.g. from https://huggingface.co/models?library=sentence-transformers&language=da&sort=trending, do train with non-English datasets for their multilingual models, often (although not always) reaching even better results.
- Tom Aarsen