details on dataset

#4
by carlesoctav - opened

Can you please provide further details about the dataset used to train this model? Specifically, which sources were utilized? I assume the dataset differs from the one described in the paper for the English model in terms of both the number of examples and the quality of the data, right?

Sorry for the late reply, the HF notification system seems to have serious delays.

Yes, the training datasets are different than reported in the paper. I'll provide more details in the model card in the coming days.

Sign up or log in to comment