eduardofv commited on
Commit
22c414a
1 Parent(s): 92a62c5

Updated Model Card

Browse files
Files changed (1) hide show
  1. README.md +24 -4
README.md CHANGED
@@ -2,6 +2,24 @@
2
 
3
  This is a test model that was fine-tuned using the Spanish datasets from [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt) in order to understand and benchmark STS models.
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  Evaluating `distilbert-base-uncased` on the Spanish test dataset before training results in:
6
 
7
  ```
@@ -14,10 +32,12 @@ While the fine-tuned version with the defaults of the training script and the Sp
14
  Cosine-Similarity : Pearson: 0.7451 Spearman: 0.7364
15
  ```
16
 
17
- ## Resources
18
 
19
- Check the modified training script [training_stsb_m_mt.py]
20
 
21
- Check [sts_eval](https://github.com/eduardofv/sts_eval) for a comparison with Tensorflow and Sentence-Transformers models
22
 
23
- Check the [development environment](https://github.com/eduardofv/ai-denv)
 
 
 
 
2
 
3
  This is a test model that was fine-tuned using the Spanish datasets from [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt) in order to understand and benchmark STS models.
4
 
5
+ ## Model and training data description
6
+
7
+ This model was built taking `distilbert-base-uncased` and training it on a Semantic Textual Similarity task using a modified version of the training script for STS from Sentece Transformers (the modified script is included in the repo). It was trained using the Spanish datasets from [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt) which are the STSBenchmark datasets automatically translated to other languages using deepl.com. Refer to the dataset repository for more details.
8
+
9
+ ## Intended uses & limitations
10
+
11
+ This model was built just as a proof-of-concept on STS fine-tuning using Spanish data and no specific use other than getting a sense on how this training works.
12
+
13
+ ## How to use
14
+
15
+ You may use it as any other STS trained model to extract sentence embeddings. Check Sentence Transformers documentation.
16
+
17
+ ## Training procedure
18
+
19
+ Use the included script to train in Spanish the base model. You can also try to train another model passing it's reference as first argument. You can also train in some other language of those included in the training dataset.
20
+
21
+ ## Evaluation results
22
+
23
  Evaluating `distilbert-base-uncased` on the Spanish test dataset before training results in:
24
 
25
  ```
 
32
  Cosine-Similarity : Pearson: 0.7451 Spearman: 0.7364
33
  ```
34
 
35
+ In our [STS Evaluation repository](https://github.com/eduardofv/sts_eval) we compare the performance of this model with other models from Sentence Transformers and Tensorflow Hub using the standard STSBenchmark and the 2017 STSBenchmark Task 3 for Spanish.
36
 
 
37
 
38
+ ## Resources
39
 
40
+ - Training dataset [stsb_multi_mt](https://huggingface.co/datasets/stsb_multi_mt)
41
+ - Sentence Transformers [Semantic Textual Similarity](https://www.sbert.net/examples/training/sts/README.html)
42
+ - Check [sts_eval](https://github.com/eduardofv/sts_eval) for a comparison with Tensorflow and Sentence-Transformers models
43
+ - Check the [development environment to run the scripts and evaluation](https://github.com/eduardofv/ai-denv)