distilbert-base-uncased trained for Semantic Textual Similarity in Spanish
This is a test model that was fine-tuned using the Spanish datasets from stsb_multi_mt in order to understand and benchmark STS models.
Evaluating distilbert-base-uncased
on the Spanish test dataset before training results in:
Cosine-Similarity : Pearson: 0.2980 Spearman: 0.4008
While the fine-tuned version with the defaults of the training script and the Spanish training dataset results in:
Cosine-Similarity : Pearson: 0.7451 Spearman: 0.7364
Resources
Check the modified training script [training_stsb_m_mt.py]
Check sts_eval for a comparison with Tensorflow and Sentence-Transformers models
Check the development environment