details on dataset

by carlesoctav - opened Jun 11, 2023

Jun 11, 2023

Can you please provide further details about the dataset used to train this model? Specifically, which sources were utilized? I assume the dataset differs from the one described in the paper for the English model in terms of both the number of examples and the quality of the data, right?

intfloat

Owner Jun 15, 2023

Sorry for the late reply, the HF notification system seems to have serious delays.

Yes, the training datasets are different than reported in the paper. I'll provide more details in the model card in the coming days.

intfloat

Owner Jun 18, 2023

FYI, I have added some training details at https://huggingface.co/intfloat/multilingual-e5-base#training-details

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment