mmazuecos commited on
Commit
e24e58a
1 Parent(s): e769683

Udated model to the best trained one.

Browse files
2_Dense/config.json DELETED
@@ -1 +0,0 @@
1
- {"in_features": 768, "out_features": 512, "bias": true, "activation_function": "torch.nn.modules.activation.Tanh"}
 
2_Dense/pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:318abfeb7ac3562dae47bd5126150009554f49c4704ea18ecc8903dfd970d857
3
- size 1575975
 
 
 
README.md CHANGED
@@ -17,7 +17,8 @@ widget:
17
 
18
  # bertin-roberta-base-finetuning-esnli
19
 
20
- This is a [sentence-transformers](https://www.SBERT.net) model trained on a collection of NLI tasks for Spanish. It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 
21
 
22
  Based around the siamese networks approach from [this paper](https://arxiv.org/pdf/1908.10084.pdf).
23
  <!--- Describe your model here -->
@@ -41,6 +42,43 @@ embeddings = model.encode(sentences)
41
  print(embeddings)
42
  ```
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ## Evaluation Results
45
 
46
  <!--- Describe how your model was evaluated -->
@@ -48,14 +86,14 @@ Our model was evaluated on the task of Semantic Textual Similarity using the [Se
48
 
49
  | | [BETO STS](https://huggingface.co/espejelomar/sentece-embeddings-BETO) | BERTIN STS (this model) | Relative improvement |
50
  |-------------------:|---------:|-----------:|---------------------:|
51
- | cosine_pearson | 0.609803 | 0.669326 | +9.76 |
52
- | cosine_spearman | 0.528776 | 0.596159 | +12.74 |
53
- | euclidean_pearson | 0.590613 | 0.665561 | +12.69 |
54
- | euclidean_spearman | 0.526529 | 0.600940 | +14.13 |
55
- | manhattan_pearson | 0.589108 | 0.665463 | +12.96 |
56
- | manhattan_spearman | 0.525910 | 0.600947 | +14.27 |
57
- | dot_pearson | 0.544078 | 0.600923 | +10.45 |
58
- | dot_spearman | 0.460427 | 0.517005 | +12.29 |
59
 
60
 
61
  ## Training
@@ -72,7 +110,8 @@ The whole dataset used is available [here](https://huggingface.co/datasets/hacka
72
 
73
  **DataLoader**:
74
 
75
- `sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader` of length 1127 with parameters:
 
76
  ```
77
  {'batch_size': 64}
78
  ```
@@ -87,7 +126,7 @@ The whole dataset used is available [here](https://huggingface.co/datasets/hacka
87
  Parameters of the fit()-Method:
88
  ```
89
  {
90
- "epochs": 20,
91
  "evaluation_steps": 0,
92
  "evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
93
  "max_grad_norm": 1,
@@ -97,7 +136,7 @@ Parameters of the fit()-Method:
97
  },
98
  "scheduler": "WarmupLinear",
99
  "steps_per_epoch": null,
100
- "warmup_steps": 1127,
101
  "weight_decay": 0.01
102
  }
103
  ```
@@ -108,7 +147,6 @@ Parameters of the fit()-Method:
108
  SentenceTransformer(
109
  (0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: RobertaModel
110
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
111
- (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
112
  )
113
  ```
114
 
17
 
18
  # bertin-roberta-base-finetuning-esnli
19
 
20
+ This is a [sentence-transformers](https://www.SBERT.net) model trained on a
21
+ collection of NLI tasks for Spanish. It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
22
 
23
  Based around the siamese networks approach from [this paper](https://arxiv.org/pdf/1908.10084.pdf).
24
  <!--- Describe your model here -->
42
  print(embeddings)
43
  ```
44
 
45
+ ## Usage (HuggingFace Transformers)
46
+ Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
47
+
48
+ ```python
49
+ from transformers import AutoTokenizer, AutoModel
50
+ import torch
51
+
52
+
53
+ #Mean Pooling - Take attention mask into account for correct averaging
54
+ def mean_pooling(model_output, attention_mask):
55
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
56
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
57
+ return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
58
+
59
+
60
+ # Sentences we want sentence embeddings for
61
+ sentences = ['This is an example sentence', 'Each sentence is converted']
62
+
63
+ # Load model from HuggingFace Hub
64
+ tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
65
+ model = AutoModel.from_pretrained('{MODEL_NAME}')
66
+
67
+ # Tokenize sentences
68
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
69
+
70
+ # Compute token embeddings
71
+ with torch.no_grad():
72
+ model_output = model(**encoded_input)
73
+
74
+ # Perform pooling. In this case, mean pooling.
75
+ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
76
+
77
+ print("Sentence embeddings:")
78
+ print(sentence_embeddings)
79
+ ```
80
+
81
+
82
  ## Evaluation Results
83
 
84
  <!--- Describe how your model was evaluated -->
86
 
87
  | | [BETO STS](https://huggingface.co/espejelomar/sentece-embeddings-BETO) | BERTIN STS (this model) | Relative improvement |
88
  |-------------------:|---------:|-----------:|---------------------:|
89
+ | cosine_pearson | 0.609803 | 0.683188 | +12.03 |
90
+ | cosine_spearman | 0.528776 | 0.615916 | +16.48 |
91
+ | euclidean_pearson | 0.590613 | 0.672601 | +13.88 |
92
+ | euclidean_spearman | 0.526529 | 0.611539 | +16.15 |
93
+ | manhattan_pearson | 0.589108 | 0.672040 | +14.08 |
94
+ | manhattan_spearman | 0.525910 | 0.610517 | +16.09 |
95
+ | dot_pearson | 0.544078 | 0.600517 | +10.37 |
96
+ | dot_spearman | 0.460427 | 0.521260 | +13.21 |
97
 
98
 
99
  ## Training
110
 
111
  **DataLoader**:
112
 
113
+ `sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader`
114
+ of length 1818 with parameters:
115
  ```
116
  {'batch_size': 64}
117
  ```
126
  Parameters of the fit()-Method:
127
  ```
128
  {
129
+ "epochs": 10,
130
  "evaluation_steps": 0,
131
  "evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
132
  "max_grad_norm": 1,
136
  },
137
  "scheduler": "WarmupLinear",
138
  "steps_per_epoch": null,
139
+ "warmup_steps": 909,
140
  "weight_decay": 0.01
141
  }
142
  ```
147
  SentenceTransformer(
148
  (0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: RobertaModel
149
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
 
150
  )
151
  ```
152
 
eval/similarity_evaluation_sts-test_results.csv CHANGED
@@ -1,21 +1,11 @@
1
  epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
2
- 0,-1,0.6619363474232212,0.5885900850993088,0.6601369870108922,0.5898887432719473,0.6588104710011553,0.5895314311753609,0.5830982449199276,0.4937547304641535
3
- 1,-1,0.6693262164948957,0.5961588063998108,0.6655605051331788,0.6009401780214281,0.6654625082134044,0.60094695325875,0.6009226777841519,0.5170048024637112
4
- 2,-1,0.6547420445754193,0.5823884531525437,0.6511385841619973,0.5855875470652291,0.6522056613011148,0.5865922991358415,0.5843887822582672,0.5008089314360363
5
- 3,-1,0.6552498219421751,0.584669481102323,0.64666647770094,0.5882148051301598,0.6481504965342848,0.5881976468771515,0.5997006203644883,0.519217293851411
6
- 4,-1,0.6528668557506568,0.582745952876213,0.6410731501019628,0.5856126736791128,0.6424055098070853,0.5859782098059078,0.5916670098025287,0.5158791756140418
7
- 5,-1,0.6483945219301912,0.5788003785077136,0.6343322715136153,0.5786700439633394,0.6364103314423305,0.579525783424424,0.5912828063040729,0.5199335798952477
8
- 6,-1,0.6423989989672444,0.5699500862683221,0.6313857693866886,0.5746688473110814,0.6328955724455424,0.5744953472610018,0.5803600675604295,0.5036125587291159
9
- 7,-1,0.6462629722681043,0.5770818260673343,0.6318230435253588,0.5775325284896901,0.6325422525209058,0.5764058505549855,0.586762886345868,0.5173493168898005
10
- 8,-1,0.639790660325868,0.5676685783645897,0.6294617784838941,0.5698867228853173,0.6299734551587954,0.5695742381451001,0.5880059591595673,0.5146391367378975
11
- 9,-1,0.6450089783532716,0.5758663314471489,0.6333562814425514,0.5766438502962163,0.6340741475326621,0.575110984810785,0.5879731498917842,0.5192021383415104
12
- 10,-1,0.6434909937737626,0.5713701447625351,0.6301188859529719,0.5709692410446885,0.6309719329436375,0.5701395230401529,0.593913567963774,0.522557073939444
13
- 11,-1,0.641203878405462,0.5722014251907718,0.6284168875038928,0.5737909498411451,0.6295168797303964,0.5728132601653629,0.5893572348665002,0.5218607585112776
14
- 12,-1,0.6405665479784053,0.5712144563426479,0.6258392075727873,0.5693129298830195,0.6262440363392721,0.5679223727890534,0.593952756495054,0.5268886237775188
15
- 13,-1,0.6390052346365416,0.5686395678794071,0.6258537618625887,0.5685859625426081,0.6265438374367317,0.5677389726542497,0.591956305872708,0.5218520657539587
16
- 14,-1,0.6401240726804178,0.5711650411421381,0.6278602450688386,0.5727693520022645,0.628050553113738,0.5709335183573409,0.5937276661244524,0.5234451981826964
17
- 15,-1,0.6398403358896347,0.5692425497972115,0.6246306232307527,0.5691193313826032,0.6255512511477327,0.5683736149577787,0.5940274286246308,0.5237160798409092
18
- 16,-1,0.640328214937794,0.5708227567207858,0.6255762617684392,0.5716483840159948,0.6265171469104598,0.569976860529018,0.5945084491171609,0.5247411860311914
19
- 17,-1,0.6404406282410006,0.5712850352823815,0.6251102831494417,0.5715652062596898,0.6257154822798084,0.5695532590559501,0.5939059512747178,0.525733381896788
20
- 18,-1,0.64141106615211,0.572578991980065,0.6261621835757434,0.5725016418579003,0.6268679024101312,0.5700271563683411,0.5965506884620715,0.5278869557071051
21
- 19,-1,0.6406208759751268,0.5720221725807018,0.625890984121176,0.5726780638465656,0.6263716719579958,0.5694595420415887,0.5959571243848296,0.5275360779553968
1
  epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
2
+ 0,-1,0.6831884913062921,0.6159162222541099,0.6726005233636806,0.6115392058863335,0.6720401096771059,0.6105173097665644,0.6005167896896939,0.5212600492097655
3
+ 1,-1,0.6706171111332979,0.6008531510212776,0.6565912032452935,0.5949169636344843,0.6555142909342582,0.5935398433843475,0.5765151466955727,0.49637768476198035
4
+ 2,-1,0.6763825624896551,0.6087882606796842,0.6627392144068636,0.6053590389366899,0.6612759395162868,0.6030838801547247,0.5826990236692152,0.5088888493638298
5
+ 3,-1,0.66260616452593,0.5913823777186296,0.6469213245153994,0.5891702556310773,0.6449471942861446,0.5872578064093931,0.5818409585899842,0.5052892808258618
6
+ 4,-1,0.6566925461921814,0.5871384798501856,0.6379456634562074,0.5819500400390282,0.6356299181697714,0.5793092883148608,0.5725533633222645,0.5005210619710372
7
+ 5,-1,0.6560126958746472,0.584645192515697,0.6375859060277993,0.5799601798248812,0.6358427415811263,0.578232849404072,0.5777523875165609,0.5017760148916008
8
+ 6,-1,0.6503433461367746,0.578081436343585,0.6326739453456565,0.5758382504320848,0.6308846572628577,0.5745397200941126,0.571361965152683,0.49444579046714365
9
+ 7,-1,0.6511867735121081,0.5769374865250576,0.6323147897935092,0.5744373103224324,0.6309669803317294,0.573106665075477,0.57342064744336,0.4975609366385161
10
+ 8,-1,0.6506119610377241,0.5781030546060674,0.6326539782626099,0.5757848865607669,0.6310415147465013,0.5743098307522757,0.5723862516745356,0.49789660206491654
11
+ 9,-1,0.6488271901388144,0.5782767677139244,0.6287620409812228,0.5742694918130841,0.6272343282453402,0.5729337473833224,0.5685335534384852,0.4968351056062509
 
 
 
 
 
 
 
 
 
 
loss_digest.json CHANGED
The diff for this file is too large to render. See raw diff
modules.json CHANGED
@@ -10,11 +10,5 @@
10
  "name": "1",
11
  "path": "1_Pooling",
12
  "type": "sentence_transformers.models.Pooling"
13
- },
14
- {
15
- "idx": 2,
16
- "name": "2",
17
- "path": "2_Dense",
18
- "type": "sentence_transformers.models.Dense"
19
  }
20
  ]
10
  "name": "1",
11
  "path": "1_Pooling",
12
  "type": "sentence_transformers.models.Pooling"
 
 
 
 
 
 
13
  }
14
  ]
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:aa94d27f78e0ea7d33cdcb67c9b9cf8959fa314dace803031713b5c976f761e2
3
  size 498664817
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eaff1c454271166e40db8096964f269f9b5de9fad5e056c455e5de9be3404ba9
3
  size 498664817