LeoChiuu commited on
Commit
4c40e6c
1 Parent(s): 498c38f

Add new SentenceTransformer model.

Browse files
Files changed (3) hide show
  1. README.md +41 -109
  2. config.json +1 -1
  3. model.safetensors +1 -1
README.md CHANGED
@@ -6,42 +6,43 @@ tags:
6
  - sentence-similarity
7
  - feature-extraction
8
  - generated_from_trainer
9
- - dataset_size:38688
10
- - loss:ContrastiveLoss
11
  base_model: sentence-transformers/all-MiniLM-L6-v2
12
  datasets: []
13
  widget:
14
- - source_sentence: There is a heavy cost for this service provided in conjunction
15
- with NOAA and SARSAT.
16
  sentences:
17
- - No significant changes have been made to the roadway except for its legal definition.
18
- - Some academics have questioned the ethics of these payments.
19
- - There is no charge for this service provided in conjunction with NOAA and SARSAT.
20
- - source_sentence: You're not thin.
 
 
21
  sentences:
22
- - This process is called low-dimensional embedded in machine learning.
23
- - You're thin.
24
- - Jean Prouvost was the founder of Marie Claire.
25
- - source_sentence: The lead man is charisma-free.
 
 
 
26
  sentences:
27
- - Fossil egg s are rare, but one oogenus, Polyclonoolithus, was discovered in the
28
- Hekou Group.
29
- - The roof is shingled, and topped by a small belfry.
30
- - The lead man doesn't have charisma.
31
- - source_sentence: Willis has criticized the rules adopted by the RNC, particularly
32
- Rules 12, 16, and 40.
33
  sentences:
34
- - Willis has fully accepted the rules adopted by the RNC, particularly Rules 12,
35
- 16, and 40.
36
- - I can't stop reading.
37
- - This force acts on water independently of the wind stress.
38
- - source_sentence: The publication was named after Sir James Joynton Smith.
39
  sentences:
40
- - Detailed specific information on the ongoing validation activities is being made
41
- available in related publications.
42
- - On November 25, 2012, Tom O'Brien was reinstated.
43
- - The publication took its name from its founder and chief financer Sir James Joynton
44
- Smith.
45
  pipeline_tag: sentence-similarity
46
  ---
47
 
@@ -95,9 +96,9 @@ from sentence_transformers import SentenceTransformer
95
  model = SentenceTransformer("LeoChiuu/all-MiniLM-L6-v2-negations")
96
  # Run inference
97
  sentences = [
98
- 'The publication was named after Sir James Joynton Smith.',
99
- 'The publication took its name from its founder and chief financer Sir James Joynton Smith.',
100
- "On November 25, 2012, Tom O'Brien was reinstated.",
101
  ]
102
  embeddings = model.encode(sentences)
103
  print(embeddings.shape)
@@ -152,25 +153,23 @@ You can finetune this model on your own dataset.
152
  #### Unnamed Dataset
153
 
154
 
155
- * Size: 38,688 training samples
156
  * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
157
  * Approximate statistics based on the first 1000 samples:
158
  | | sentence_0 | sentence_1 | label |
159
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
160
  | type | string | string | int |
161
- | details | <ul><li>min: 5 tokens</li><li>mean: 15.94 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 15.96 tokens</li><li>max: 44 tokens</li></ul> | <ul><li>0: ~48.50%</li><li>1: ~51.50%</li></ul> |
162
  * Samples:
163
- | sentence_0 | sentence_1 | label |
164
- |:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------|
165
- | <code>No, that is impossible.</code> | <code>No, that is not possible.</code> | <code>0</code> |
166
- | <code>The building did indeed serve as a hof, according to the bone finds.</code> | <code>The bone finds thus indicate the building did indeed serve as a hof.</code> | <code>0</code> |
167
- | <code>The building became a pet shop.</code> | <code>The building became a hospital.</code> | <code>1</code> |
168
- * Loss: [<code>ContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#contrastiveloss) with these parameters:
169
  ```json
170
  {
171
- "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
172
- "margin": 0.5,
173
- "size_average": true
174
  }
175
  ```
176
 
@@ -293,59 +292,6 @@ You can finetune this model on your own dataset.
293
 
294
  </details>
295
 
296
- ### Training Logs
297
- | Epoch | Step | Training Loss |
298
- |:------:|:-----:|:-------------:|
299
- | 0.2068 | 500 | 0.0353 |
300
- | 0.4136 | 1000 | 0.0307 |
301
- | 0.6203 | 1500 | 0.0234 |
302
- | 0.8271 | 2000 | 0.0187 |
303
- | 1.0339 | 2500 | 0.0152 |
304
- | 1.2407 | 3000 | 0.0134 |
305
- | 1.4475 | 3500 | 0.0123 |
306
- | 1.6543 | 4000 | 0.0111 |
307
- | 1.8610 | 4500 | 0.0107 |
308
- | 2.0678 | 5000 | 0.0097 |
309
- | 2.2746 | 5500 | 0.0096 |
310
- | 2.4814 | 6000 | 0.0091 |
311
- | 2.6882 | 6500 | 0.0087 |
312
- | 2.8950 | 7000 | 0.0086 |
313
- | 3.1017 | 7500 | 0.0075 |
314
- | 3.3085 | 8000 | 0.008 |
315
- | 3.5153 | 8500 | 0.0074 |
316
- | 3.7221 | 9000 | 0.007 |
317
- | 3.9289 | 9500 | 0.007 |
318
- | 4.1356 | 10000 | 0.0063 |
319
- | 4.3424 | 10500 | 0.0068 |
320
- | 4.5492 | 11000 | 0.0061 |
321
- | 4.7560 | 11500 | 0.0059 |
322
- | 4.9628 | 12000 | 0.0056 |
323
- | 5.1696 | 12500 | 0.0052 |
324
- | 5.3763 | 13000 | 0.0055 |
325
- | 5.5831 | 13500 | 0.0051 |
326
- | 5.7899 | 14000 | 0.005 |
327
- | 5.9967 | 14500 | 0.0047 |
328
- | 6.2035 | 15000 | 0.0046 |
329
- | 6.4103 | 15500 | 0.0047 |
330
- | 6.6170 | 16000 | 0.0044 |
331
- | 6.8238 | 16500 | 0.0044 |
332
- | 7.0306 | 17000 | 0.0041 |
333
- | 7.2374 | 17500 | 0.004 |
334
- | 7.4442 | 18000 | 0.0044 |
335
- | 7.6510 | 18500 | 0.0039 |
336
- | 7.8577 | 19000 | 0.0038 |
337
- | 8.0645 | 19500 | 0.0038 |
338
- | 8.2713 | 20000 | 0.0037 |
339
- | 8.4781 | 20500 | 0.0039 |
340
- | 8.6849 | 21000 | 0.0037 |
341
- | 8.8916 | 21500 | 0.0036 |
342
- | 9.0984 | 22000 | 0.0034 |
343
- | 9.3052 | 22500 | 0.0036 |
344
- | 9.5120 | 23000 | 0.0035 |
345
- | 9.7188 | 23500 | 0.0034 |
346
- | 9.9256 | 24000 | 0.0035 |
347
-
348
-
349
  ### Framework Versions
350
  - Python: 3.11.9
351
  - Sentence Transformers: 3.0.1
@@ -372,20 +318,6 @@ You can finetune this model on your own dataset.
372
  }
373
  ```
374
 
375
- #### ContrastiveLoss
376
- ```bibtex
377
- @inproceedings{hadsell2006dimensionality,
378
- author={Hadsell, R. and Chopra, S. and LeCun, Y.},
379
- booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
380
- title={Dimensionality Reduction by Learning an Invariant Mapping},
381
- year={2006},
382
- volume={2},
383
- number={},
384
- pages={1735-1742},
385
- doi={10.1109/CVPR.2006.100}
386
- }
387
- ```
388
-
389
  <!--
390
  ## Glossary
391
 
 
6
  - sentence-similarity
7
  - feature-extraction
8
  - generated_from_trainer
9
+ - dataset_size:75
10
+ - loss:CosineSimilarityLoss
11
  base_model: sentence-transformers/all-MiniLM-L6-v2
12
  datasets: []
13
  widget:
14
+ - source_sentence: This store featured in the SavaCentre TV adverts in 1983.
 
15
  sentences:
16
+ - I love the Scream movies and all horror movies and this one ranks way up there.
17
+ - Development of synchronous toothed-belts was halted by the Gilmer company prior
18
+ to 1940.
19
+ - This store was not featured in the SavaCentre TV promotions in 1983.
20
+ - source_sentence: In 2014, Nextgen earns KLAS Top Performance Honors for Ambulatory
21
+ RCM Services.
22
  sentences:
23
+ - These strategies employ reporter transposon s and in vitro expression technology
24
+ (IVET).
25
+ - In 2014, Nextgen fails to achieve KLAS Top Performance Honors for Ambulatory RCM
26
+ Services.
27
+ - The film's sole bright spot was Jonah Hill (who will look almost unrecognizable
28
+ to fans of the recent Superbad due to the amount of weight he lost in the interim).
29
+ - source_sentence: E105 has never been implicated in atopic asthma.
30
  sentences:
31
+ - E105 has been implicated in non-atopic asthma.
32
+ - The species is named in honor of the divorce of Sara Anderson and Malcolm Slaney.
33
+ - Each annex to a filed document is not required to have page numbering.
34
+ - source_sentence: Additionally, a church at San Lazaro in Orange Walk District escaped
35
+ all damage.
 
36
  sentences:
37
+ - Kuwait has a reputation for being the central music influence of the GCC countries.
38
+ - Early settlers may have introduced it 4,000 years ago.
39
+ - Additionally, a church at San Lazaro in Orange Walk District suffered severe damage.
40
+ - source_sentence: The content in Australia is lower than in other reports.
 
41
  sentences:
42
+ - Other reports also show a content lower than 0.1% in Australia.
43
+ - Commercial DNP is unable to be utilized as an antiseptic or as a non-selective
44
+ bioaccumulating pesticide.
45
+ - Installation of Halon systems is mandated by the European Union.
 
46
  pipeline_tag: sentence-similarity
47
  ---
48
 
 
96
  model = SentenceTransformer("LeoChiuu/all-MiniLM-L6-v2-negations")
97
  # Run inference
98
  sentences = [
99
+ 'The content in Australia is lower than in other reports.',
100
+ 'Other reports also show a content lower than 0.1% in Australia.',
101
+ 'Installation of Halon systems is mandated by the European Union.',
102
  ]
103
  embeddings = model.encode(sentences)
104
  print(embeddings.shape)
 
153
  #### Unnamed Dataset
154
 
155
 
156
+ * Size: 75 training samples
157
  * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
158
  * Approximate statistics based on the first 1000 samples:
159
  | | sentence_0 | sentence_1 | label |
160
  |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
161
  | type | string | string | int |
162
+ | details | <ul><li>min: 9 tokens</li><li>mean: 16.36 tokens</li><li>max: 39 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 16.55 tokens</li><li>max: 43 tokens</li></ul> | <ul><li>0: ~61.33%</li><li>1: ~38.67%</li></ul> |
163
  * Samples:
164
+ | sentence_0 | sentence_1 | label |
165
+ |:---------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------|:---------------|
166
+ | <code>It wasn't an inexpensive piece, but I would still have expected better quality.</code> | <code>It was an inexpensive piece, but I would still have expected better quality.</code> | <code>0</code> |
167
+ | <code>My name is noncrucial.</code> | <code>My name is important.</code> | <code>0</code> |
168
+ | <code>Hawthorne mostly wrote against his own religious belief.</code> | <code>Hawthorne wrote against his beliefs.</code> | <code>1</code> |
169
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
170
  ```json
171
  {
172
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
 
 
173
  }
174
  ```
175
 
 
292
 
293
  </details>
294
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
295
  ### Framework Versions
296
  - Python: 3.11.9
297
  - Sentence Transformers: 3.0.1
 
318
  }
319
  ```
320
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
321
  <!--
322
  ## Glossary
323
 
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "LeoChiuu/all-MiniLM-L6-v2-negations",
3
  "architectures": [
4
  "BertModel"
5
  ],
 
1
  {
2
+ "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
3
  "architectures": [
4
  "BertModel"
5
  ],
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6b68ced173371fb910f0fa0d901c4c1cf752167e673dd1c7a014d80b80d7410a
3
  size 90864192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b3f93fc93c0fbdf4be9f9217841543915515a6610212538520a608457a9d4a7
3
  size 90864192