nreimers's picture
nreimers
HF staff
Update README.md 5b0c02b
1
---
2
pipeline_tag: sentence-similarity
3
tags:
4
- sentence-transformers
5
- feature-extraction
6
- sentence-similarity
7
- transformers
8
license: apache-2.0
9
---
10
11
# bert-base-nli-cls-token
12
13
**⚠️ This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: [SBERT.net - Pretrained Models](https://www.sbert.net/docs/pretrained_models.html)**
14
15
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
16
17
18
19
## Usage (Sentence-Transformers)
20
21
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
22
23
```
24
pip install -U sentence-transformers
25
```
26
27
Then you can use the model like this:
28
29
```python
30
from sentence_transformers import SentenceTransformer
31
sentences = ["This is an example sentence", "Each sentence is converted"]
32
33
model = SentenceTransformer('sentence-transformers/bert-base-nli-cls-token')
34
embeddings = model.encode(sentences)
35
print(embeddings)
36
```
37
38
39
40
## Usage (HuggingFace Transformers)
41
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
42
43
```python
44
from transformers import AutoTokenizer, AutoModel
45
import torch
46
47
48
def cls_pooling(model_output, attention_mask):
49
    return model_output[0][:,0]
50
51
52
# Sentences we want sentence embeddings for
53
sentences = ['This is an example sentence', 'Each sentence is converted']
54
55
# Load model from HuggingFace Hub
56
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/bert-base-nli-cls-token')
57
model = AutoModel.from_pretrained('sentence-transformers/bert-base-nli-cls-token')
58
59
# Tokenize sentences
60
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
61
62
# Compute token embeddings
63
with torch.no_grad():
64
    model_output = model(**encoded_input)
65
66
# Perform pooling. In this case, max pooling.
67
sentence_embeddings = cls_pooling(model_output, encoded_input['attention_mask'])
68
69
print("Sentence embeddings:")
70
print(sentence_embeddings)
71
```
72
73
74
75
## Evaluation Results
76
77
78
79
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/bert-base-nli-cls-token)
80
81
82
83
## Full Model Architecture
84
```
85
SentenceTransformer(
86
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
87
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
88
)
89
```
90
91
## Citing & Authors
92
93
This model was trained by [sentence-transformers](https://www.sbert.net/). 
94
        
95
If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
96
```bibtex 
97
@inproceedings{reimers-2019-sentence-bert,
98
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
99
    author = "Reimers, Nils and Gurevych, Iryna",
100
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
101
    month = "11",
102
    year = "2019",
103
    publisher = "Association for Computational Linguistics",
104
    url = "http://arxiv.org/abs/1908.10084",
105
}
106
```