# distilbert-base-uncased trained for Semantic Textual Similarity in Spanish

This is a test model that was fine-tuned using the Spanish datasets from [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt) in order to understand and benchmark STS models.

Evaluating `distilbert-base-uncased` on the Spanish test dataset before training results in:

```
Cosine-Similarity :	Pearson: 0.2980	Spearman: 0.4008
```

While the fine-tuned version with the defaults of the training script and the Spanish training dataset results in:

```
Cosine-Similarity :	Pearson: 0.7451	Spearman: 0.7364
```

## Resources

Check the modified training script [training_stsb_m_mt.py]

Check [sts_eval](https://github.com/eduardofv/sts_eval) for a comparison with Tensorflow and Sentence-Transformers models

Check the [development environment](https://github.com/eduardofv/ai-denv)