eduardofv's picture
Added training script and updated README
92a62c5
|
raw
history blame
869 Bytes

distilbert-base-uncased trained for Semantic Textual Similarity in Spanish

This is a test model that was fine-tuned using the Spanish datasets from stsb_multi_mt in order to understand and benchmark STS models.

Evaluating distilbert-base-uncased on the Spanish test dataset before training results in:

Cosine-Similarity :	Pearson: 0.2980	Spearman: 0.4008

While the fine-tuned version with the defaults of the training script and the Spanish training dataset results in:

Cosine-Similarity :	Pearson: 0.7451	Spearman: 0.7364

Resources

Check the modified training script [training_stsb_m_mt.py]

Check sts_eval for a comparison with Tensorflow and Sentence-Transformers models

Check the development environment