Sentence Similarity
sentence-transformers
Safetensors
English
modernbert
biencoder
text-classification
sentence-pair-classification
semantic-similarity
semantic-search
retrieval
reranking
Generated from Trainer
dataset_size:1451941
loss:MultipleNegativesRankingLoss
Eval Results
text-embeddings-inference
Add new SentenceTransformer model
Browse files- README.md +44 -86
- model.safetensors +1 -1
README.md
CHANGED
@@ -12,54 +12,9 @@ tags:
|
|
12 |
- retrieval
|
13 |
- reranking
|
14 |
- generated_from_trainer
|
15 |
-
- dataset_size:
|
16 |
-
- loss:
|
17 |
base_model: Alibaba-NLP/gte-modernbert-base
|
18 |
-
widget:
|
19 |
-
- source_sentence: In 2015 Adolf Hitler appeared in the kickstarter short movie ``
|
20 |
-
Kung Fury `` as Taccone ( A.K.A .
|
21 |
-
sentences:
|
22 |
-
- In 2015 , Adolf Hitler appeared in the Kickstarter - short film `` Kung Fury ``
|
23 |
-
as Taccone ( A.K.A .
|
24 |
-
- In 1795 , the only white residents were Dr. John Laidley and two brothers with
|
25 |
-
the surname Ainslie .
|
26 |
-
- The 125th University Match was played in March 2014 at the Rye Golf Club , Oxford
|
27 |
-
, East Sussex won the game 8.5 - 6.5 .
|
28 |
-
- source_sentence: From 1973 to 1974 , Aubrey toured with the Cambridge Theatre Company
|
29 |
-
as Diggory in `` She Stoops to Conquer `` and again as Aguecheek .
|
30 |
-
sentences:
|
31 |
-
- Oxide can be reduced to metallic samarium at higher temperatures by heating with
|
32 |
-
a reducing agent such as hydrogen or carbon monoxide .
|
33 |
-
- From 1973 to 1974 Aguecheek toured with the Cambridge Theatre Company as Diggory
|
34 |
-
in `` You Stoops to Conquer `` and again as Aubrey .
|
35 |
-
- The medals were presented by Barry Maister , IOC member , New Zealand and Sarah
|
36 |
-
Webb Gosling , Vice President of World Sailing .
|
37 |
-
- source_sentence: There is no official wall on the border , although there are sections
|
38 |
-
of fence near populated areas and continuous border crossings .
|
39 |
-
sentences:
|
40 |
-
- The 2014 -- 15 Boston Bruins season was the 91st season for the National Hockey
|
41 |
-
League franchise that was established on November 1 , 1924 .
|
42 |
-
- He was trained by the Inghams and owned by John Hawkes .
|
43 |
-
- There is no continuous wall on the border , although there are fence sections
|
44 |
-
near populated areas and official border crossings .
|
45 |
-
- source_sentence: Capital . `` The French established similar hill stations in Indochina
|
46 |
-
, such as Dalat built in 1921 .
|
47 |
-
sentences:
|
48 |
-
- Lubuk China is a small town in Alor Gajah District , Melaka , Malaysia . It is
|
49 |
-
situated near the border with Negeri Sembilan .
|
50 |
-
- The French established similar hill stations in Indochina , such as Dalat , built
|
51 |
-
in 1921 .
|
52 |
-
- John Potts ( or Pott ) was a doctor and colonial governor of Virginia in the Jamestown
|
53 |
-
settlement at Virginia Colony in the early 17th century .
|
54 |
-
- source_sentence: The band pursued `` signals `` in January 2012 in three weeks ,
|
55 |
-
and drums were recorded in a day and a half .
|
56 |
-
sentences:
|
57 |
-
- It was repaired at the beginning of the 20th century and is listed as closed in
|
58 |
-
our records .
|
59 |
-
- The band tracked `` Signals `` in three weeks in January 2012 . Drums were recorded
|
60 |
-
in a day and a half .
|
61 |
-
- Contributors include actor Anton LaVey , Satanist Christopher Lee , serial killer
|
62 |
-
expert Clive Barker , author Karen Greenlee , and necrophile Robert Ressler .
|
63 |
datasets:
|
64 |
- redis/langcache-sentencepairs-v1
|
65 |
pipeline_tag: sentence-similarity
|
@@ -151,9 +106,9 @@ from sentence_transformers import SentenceTransformer
|
|
151 |
model = SentenceTransformer("redis/langcache-embed-v3")
|
152 |
# Run inference
|
153 |
sentences = [
|
154 |
-
'The
|
155 |
-
'
|
156 |
-
'
|
157 |
]
|
158 |
embeddings = model.encode(sentences)
|
159 |
print(embeddings.shape)
|
@@ -162,9 +117,9 @@ print(embeddings.shape)
|
|
162 |
# Get the similarity scores for the embeddings
|
163 |
similarities = model.similarity(embeddings, embeddings)
|
164 |
print(similarities)
|
165 |
-
# tensor([[0.
|
166 |
-
# [0.
|
167 |
-
# [0.
|
168 |
```
|
169 |
|
170 |
<!--
|
@@ -228,24 +183,25 @@ You can finetune this model on your own dataset.
|
|
228 |
#### LangCache Sentence Pairs (all)
|
229 |
|
230 |
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
|
231 |
-
* Size:
|
232 |
-
* Columns: <code>
|
233 |
* Approximate statistics based on the first 1000 samples:
|
234 |
-
| |
|
235 |
-
|
236 |
-
| type |
|
237 |
-
| details | <ul><li>min:
|
238 |
* Samples:
|
239 |
-
|
|
240 |
-
|
241 |
-
| <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats
|
242 |
-
| <code>
|
243 |
-
| <code>
|
244 |
-
* Loss: [<code>
|
245 |
```json
|
246 |
{
|
247 |
"scale": 20.0,
|
248 |
-
"similarity_fct": "
|
|
|
249 |
}
|
250 |
```
|
251 |
|
@@ -254,24 +210,25 @@ You can finetune this model on your own dataset.
|
|
254 |
#### LangCache Sentence Pairs (all)
|
255 |
|
256 |
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
|
257 |
-
* Size:
|
258 |
-
* Columns: <code>
|
259 |
* Approximate statistics based on the first 1000 samples:
|
260 |
-
| |
|
261 |
-
|
262 |
-
| type |
|
263 |
-
| details | <ul><li>min:
|
264 |
* Samples:
|
265 |
-
|
|
266 |
-
|
267 |
-
| <code>The newer Punts are still very much in existence today and race in the same fleets as the older boats
|
268 |
-
| <code>
|
269 |
-
| <code>
|
270 |
-
* Loss: [<code>
|
271 |
```json
|
272 |
{
|
273 |
"scale": 20.0,
|
274 |
-
"similarity_fct": "
|
|
|
275 |
}
|
276 |
```
|
277 |
|
@@ -307,14 +264,15 @@ You can finetune this model on your own dataset.
|
|
307 |
}
|
308 |
```
|
309 |
|
310 |
-
####
|
311 |
```bibtex
|
312 |
-
@
|
313 |
-
title={
|
314 |
-
author={
|
315 |
-
year={
|
316 |
-
|
317 |
-
|
|
|
318 |
}
|
319 |
```
|
320 |
|
|
|
12 |
- retrieval
|
13 |
- reranking
|
14 |
- generated_from_trainer
|
15 |
+
- dataset_size:1451941
|
16 |
+
- loss:MultipleNegativesRankingLoss
|
17 |
base_model: Alibaba-NLP/gte-modernbert-base
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
datasets:
|
19 |
- redis/langcache-sentencepairs-v1
|
20 |
pipeline_tag: sentence-similarity
|
|
|
106 |
model = SentenceTransformer("redis/langcache-embed-v3")
|
107 |
# Run inference
|
108 |
sentences = [
|
109 |
+
'The weather is lovely today.',
|
110 |
+
"It's so sunny outside!",
|
111 |
+
'He drove to the stadium.',
|
112 |
]
|
113 |
embeddings = model.encode(sentences)
|
114 |
print(embeddings.shape)
|
|
|
117 |
# Get the similarity scores for the embeddings
|
118 |
similarities = model.similarity(embeddings, embeddings)
|
119 |
print(similarities)
|
120 |
+
# tensor([[0.9922, 0.7891, 0.4629],
|
121 |
+
# [0.7891, 1.0000, 0.5117],
|
122 |
+
# [0.4629, 0.5117, 1.0000]], dtype=torch.bfloat16)
|
123 |
```
|
124 |
|
125 |
<!--
|
|
|
183 |
#### LangCache Sentence Pairs (all)
|
184 |
|
185 |
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
|
186 |
+
* Size: 109,885 training samples
|
187 |
+
* Columns: <code>texts</code>
|
188 |
* Approximate statistics based on the first 1000 samples:
|
189 |
+
| | texts |
|
190 |
+
|:--------|:--------------------------------------------------------------------------------------|
|
191 |
+
| type | list |
|
192 |
+
| details | <ul><li>min: 3 elements</li><li>mean: 3.50 elements</li><li>max: 4 elements</li></ul> |
|
193 |
* Samples:
|
194 |
+
| texts |
|
195 |
+
|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
196 |
+
| <code>['The newer Punts are still very much in existence today and race in the same fleets as the older boats .', 'The newer punts are still very much in existence today and run in the same fleets as the older boats .', 'how can I get financial freedom as soon as possible?']</code> |
|
197 |
+
| <code>['The newer punts are still very much in existence today and run in the same fleets as the older boats .', 'The newer Punts are still very much in existence today and race in the same fleets as the older boats .', 'The older Punts are still very much in existence today and race in the same fleets as the newer boats .']</code> |
|
198 |
+
| <code>['Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .', 'Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .', 'Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .']</code> |
|
199 |
+
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
200 |
```json
|
201 |
{
|
202 |
"scale": 20.0,
|
203 |
+
"similarity_fct": "cos_sim",
|
204 |
+
"gather_across_devices": false
|
205 |
}
|
206 |
```
|
207 |
|
|
|
210 |
#### LangCache Sentence Pairs (all)
|
211 |
|
212 |
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v1)
|
213 |
+
* Size: 109,885 evaluation samples
|
214 |
+
* Columns: <code>texts</code>
|
215 |
* Approximate statistics based on the first 1000 samples:
|
216 |
+
| | texts |
|
217 |
+
|:--------|:--------------------------------------------------------------------------------------|
|
218 |
+
| type | list |
|
219 |
+
| details | <ul><li>min: 3 elements</li><li>mean: 3.50 elements</li><li>max: 4 elements</li></ul> |
|
220 |
* Samples:
|
221 |
+
| texts |
|
222 |
+
|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
223 |
+
| <code>['The newer Punts are still very much in existence today and race in the same fleets as the older boats .', 'The newer punts are still very much in existence today and run in the same fleets as the older boats .', 'how can I get financial freedom as soon as possible?']</code> |
|
224 |
+
| <code>['The newer punts are still very much in existence today and run in the same fleets as the older boats .', 'The newer Punts are still very much in existence today and race in the same fleets as the older boats .', 'The older Punts are still very much in existence today and race in the same fleets as the newer boats .']</code> |
|
225 |
+
| <code>['Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .', 'Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .', 'Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .']</code> |
|
226 |
+
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
227 |
```json
|
228 |
{
|
229 |
"scale": 20.0,
|
230 |
+
"similarity_fct": "cos_sim",
|
231 |
+
"gather_across_devices": false
|
232 |
}
|
233 |
```
|
234 |
|
|
|
264 |
}
|
265 |
```
|
266 |
|
267 |
+
#### MultipleNegativesRankingLoss
|
268 |
```bibtex
|
269 |
+
@misc{henderson2017efficient,
|
270 |
+
title={Efficient Natural Language Response Suggestion for Smart Reply},
|
271 |
+
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
|
272 |
+
year={2017},
|
273 |
+
eprint={1705.00652},
|
274 |
+
archivePrefix={arXiv},
|
275 |
+
primaryClass={cs.CL}
|
276 |
}
|
277 |
```
|
278 |
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 298041696
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:95d02211c4cca89113f9f3e93ed91f5176bf50170faa2cb835f7bfea15bb9dd2
|
3 |
size 298041696
|