metadata
language:
- en
license: apache-2.0
tags:
- biencoder
- sentence-transformers
- text-classification
- sentence-pair-classification
- semantic-similarity
- semantic-search
- retrieval
- reranking
- generated_from_trainer
- dataset_size:13667
- loss:ArcFaceInBatchLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: >-
It was mobilized in December 2014 from elements of the dissolved 51st
Mechanized Brigade and newly formed units .
sentences:
- >-
This North-South route falls entirely in the Belgian territory and runs
together with the Belgian roads N31 and A17 .
- >-
It was mobilized in December 2014 from elements of the disbanded 51st
Mechanized Brigade and newly formed units .
- All windows are double wood , hung up with a single light .
- source_sentence: It is located at Ellison Bay , in the town of Liberty Grove , Wisconsin .
sentences:
- >-
It is located in Ellison Bay , in the town of Liberty Grove , Wisconsin
.
- >-
It is located in Liberty Grove , Wisconsin , in the town of Ellison Bay
.
- >-
The Hadejia River ( Hausa : `` kogin Haɗeja `` ) is a river in northern
Nigeria and is a tributary of the Yobe River ( Komadugu Yobe ) .
- source_sentence: >-
Both long and short vowels can be nasalized ( differentiation between ``
acces `` and `` Ä cces `` below ) , but long nasal vowels are more common
.
sentences:
- >-
Both long and short vowels can be nasalized ( the distinction between ``
acces `` and `` ącces `` below ) , but long nasal vowels are more common
.
- >-
Wilson was a member of the Senate from 1844 to 1846 and 1850 to 1852 .
From 1851 to 1852 he was the Massachusetts State Senate 's President .
- >-
Both long vowels can be nasalized ( the distinction between `` acces ``
and `` ącces `` below ) , but long and short nasal vowels are more
common .
- source_sentence: >-
At that time , on June 22 , 1754 , Edward Bentham married Bentham
Elizabeth Bates ( d . 1790 ) from Hampshire in the nearby county of Alton
.
sentences:
- >-
The Department of Criminal Justice developed the first certificate
program in forensic science in North Carolina and sponsors a summer
comparative studies program based in Europe .
- >-
At that time , on June 22 , 1754 , Edward Bentham married Bentham
Elizabeth Bates ( d . 1790 ) from Hampshire in the nearby county of
Alton .
- >-
It was at this time , on 22 June 1754 , that Edward Bentham married
Elizabeth Bates ( d 1790 ) from Alton in the nearby county of Hampshire
.
- source_sentence: >-
In 1973 Michels ' apos broke ; Barcelona the world transfer record to
bring Cruyff to Catalonia .
sentences:
- >-
In 1973 , Cruyff 'Barcelona broke the world transfer record to bring
Michels to Catalonia .
- >-
Amalric then marched to Cairo , where Shawar offered Amalric two million
pieces of gold .
- >-
In 1973 Michels ' apos broke ; Barcelona the world transfer record to
bring Cruyff to Catalonia .
datasets:
- redis/langcache-sentencepairs-v2
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_precision@1
- cosine_recall@1
- cosine_ndcg@10
- cosine_mrr@1
- cosine_map@100
- cosine_auc_precision_cache_hit_ratio
- cosine_auc_similarity_distribution
model-index:
- name: Redis fine-tuned BiEncoder model for semantic caching on LangCache
results:
- task:
type: custom-information-retrieval
name: Custom Information Retrieval
dataset:
name: test
type: test
metrics:
- type: cosine_accuracy@1
value: 0.5767756724811061
name: Cosine Accuracy@1
- type: cosine_precision@1
value: 0.5767756724811061
name: Cosine Precision@1
- type: cosine_recall@1
value: 0.5587801563902068
name: Cosine Recall@1
- type: cosine_ndcg@10
value: 0.765320607860921
name: Cosine Ndcg@10
- type: cosine_mrr@1
value: 0.5767756724811061
name: Cosine Mrr@1
- type: cosine_map@100
value: 0.7130569949974509
name: Cosine Map@100
- type: cosine_auc_precision_cache_hit_ratio
value: 0.33372951540341317
name: Cosine Auc Precision Cache Hit Ratio
- type: cosine_auc_similarity_distribution
value: 0.1529248551010913
name: Cosine Auc Similarity Distribution
Redis fine-tuned BiEncoder model for semantic caching on LangCache
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the LangCache Sentence Pairs (all) dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for sentence pair similarity.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 100 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 100, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("redis/langcache-embed-experimental")
# Run inference
sentences = [
"In 1973 Michels ' apos broke ; Barcelona the world transfer record to bring Cruyff to Catalonia .",
"In 1973 Michels ' apos broke ; Barcelona the world transfer record to bring Cruyff to Catalonia .",
"In 1973 , Cruyff 'Barcelona broke the world transfer record to bring Michels to Catalonia .",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 1.0000, 0.9219],
# [1.0000, 1.0000, 0.9219],
# [0.9219, 0.9219, 1.0078]], dtype=torch.bfloat16)
Evaluation
Metrics
Custom Information Retrieval
- Dataset:
test - Evaluated with
ir_evaluator.CustomInformationRetrievalEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.5768 |
| cosine_precision@1 | 0.5768 |
| cosine_recall@1 | 0.5588 |
| cosine_ndcg@10 | 0.7653 |
| cosine_mrr@1 | 0.5768 |
| cosine_map@100 | 0.7131 |
| cosine_auc_precision_cache_hit_ratio | 0.3337 |
| cosine_auc_similarity_distribution | 0.1529 |
Training Details
Training Dataset
LangCache Sentence Pairs (all)
- Dataset: LangCache Sentence Pairs (all)
- Size: 6,780 training samples
- Columns:
anchor,positive, andnegative - Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 8 tokens
- mean: 26.28 tokens
- max: 47 tokens
- min: 8 tokens
- mean: 26.27 tokens
- max: 47 tokens
- min: 8 tokens
- mean: 26.25 tokens
- max: 47 tokens
- Samples:
anchor positive negative The newer Punts are still very much in existence today and race in the same fleets as the older boats .The newer punts are still very much in existence today and run in the same fleets as the older boats .This marine species occurs in the eastern Indian Ocean and before the Maldives and New Caledonia .The newer punts are still very much in existence today and run in the same fleets as the older boats .The newer Punts are still very much in existence today and race in the same fleets as the older boats .Both young people burn with love really , for both , but without being able to say it to himself , admitting him always .Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada . - Loss:
losses.ArcFaceInBatchLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Evaluation Dataset
LangCache Sentence Pairs (all)
- Dataset: LangCache Sentence Pairs (all)
- Size: 6,780 evaluation samples
- Columns:
anchor,positive, andnegative - Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 8 tokens
- mean: 26.28 tokens
- max: 47 tokens
- min: 8 tokens
- mean: 26.27 tokens
- max: 47 tokens
- min: 8 tokens
- mean: 26.25 tokens
- max: 47 tokens
- Samples:
anchor positive negative The newer Punts are still very much in existence today and race in the same fleets as the older boats .The newer punts are still very much in existence today and run in the same fleets as the older boats .This marine species occurs in the eastern Indian Ocean and before the Maldives and New Caledonia .The newer punts are still very much in existence today and run in the same fleets as the older boats .The newer Punts are still very much in existence today and race in the same fleets as the older boats .Both young people burn with love really , for both , but without being able to say it to himself , admitting him always .Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada .Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada .Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada . - Loss:
losses.ArcFaceInBatchLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false }
Training Logs
| Epoch | Step | test_cosine_ndcg@10 |
|---|---|---|
| -1 | -1 | 0.7653 |
Framework Versions
- Python: 3.12.3
- Sentence Transformers: 5.1.0
- Transformers: 4.56.0
- PyTorch: 2.8.0+cu128
- Accelerate: 1.10.1
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}