File size: 869 Bytes
2c062fc 92a62c5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# distilbert-base-uncased trained for Semantic Textual Similarity in Spanish
This is a test model that was fine-tuned using the Spanish datasets from [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt) in order to understand and benchmark STS models.
Evaluating `distilbert-base-uncased` on the Spanish test dataset before training results in:
```
Cosine-Similarity : Pearson: 0.2980 Spearman: 0.4008
```
While the fine-tuned version with the defaults of the training script and the Spanish training dataset results in:
```
Cosine-Similarity : Pearson: 0.7451 Spearman: 0.7364
```
## Resources
Check the modified training script [training_stsb_m_mt.py]
Check [sts_eval](https://github.com/eduardofv/sts_eval) for a comparison with Tensorflow and Sentence-Transformers models
Check the [development environment](https://github.com/eduardofv/ai-denv)
|