julien-c HF staff commited on
Commit
df705f5
1 Parent(s): b6e24d9

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/sentence-transformers/bert-base-nli-cls-token/README.md

Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - exbert
5
+ license: apache-2.0
6
+ datasets:
7
+ - snli
8
+ - multi_nli
9
+ ---
10
+
11
+ # BERT base model (uncased) for Sentence Embeddings
12
+ This is the `bert-base-nli-cls-token` model from the [sentence-transformers](https://github.com/UKPLab/sentence-transformers)-repository. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings.
13
+ The model is described in the paper [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
14
+
15
+ ## Usage (HuggingFace Models Repository)
16
+
17
+ You can use the model directly from the model repository to compute sentence embeddings. The CLS token of each input represents the sentence embedding:
18
+ ```python
19
+ from transformers import AutoTokenizer, AutoModel
20
+ import torch
21
+
22
+
23
+ #Sentences we want sentence embeddings for
24
+ sentences = ['This framework generates embeddings for each input sentence',
25
+ 'Sentences are passed as a list of string.',
26
+ 'The quick brown fox jumps over the lazy dog.']
27
+
28
+ #Load AutoModel from huggingface model repository
29
+ tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-cls-token")
30
+ model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-cls-token")
31
+
32
+ #Tokenize sentences
33
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
34
+
35
+ #Compute token embeddings
36
+ with torch.no_grad():
37
+ model_output = model(**encoded_input)
38
+ sentence_embeddings = model_output[0][:,0] #Take the first token ([CLS]) from each sentence
39
+
40
+ print("Sentence embeddings:")
41
+ print(sentence_embeddings)
42
+ ```
43
+
44
+ ## Usage (Sentence-Transformers)
45
+ Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
46
+ ```
47
+ pip install -U sentence-transformers
48
+ ```
49
+
50
+ Then you can use the model like this:
51
+ ```python
52
+ from sentence_transformers import SentenceTransformer
53
+ model = SentenceTransformer('bert-base-nli-cls-token')
54
+ sentences = ['This framework generates embeddings for each input sentence',
55
+ 'Sentences are passed as a list of string.',
56
+ 'The quick brown fox jumps over the lazy dog.']
57
+ sentence_embeddings = model.encode(sentences)
58
+
59
+ print("Sentence embeddings:")
60
+ print(sentence_embeddings)
61
+ ```
62
+
63
+
64
+ ## Citing & Authors
65
+ If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
66
+ ```
67
+ @inproceedings{reimers-2019-sentence-bert,
68
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
69
+ author = "Reimers, Nils and Gurevych, Iryna",
70
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
71
+ month = "11",
72
+ year = "2019",
73
+ publisher = "Association for Computational Linguistics",
74
+ url = "http://arxiv.org/abs/1908.10084",
75
+ }
76
+ ```