gizacard commited on
Commit
9fb3bc5
1 Parent(s): 43ff5fa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This model is the finetuned version of the pre-trained contriever model available here https://huggingface.co/facebook/contriever, following the approach described in [Towards Unsupervised Dense Information Retrieval with Contrastive Learning](https://arxiv.org/abs/2112.09118). The associated GitHub repository is available here https://github.com/facebookresearch/contriever.
2
+
3
+ ## Usage (HuggingFace Transformers)
4
+ Using the model directly available in HuggingFace transformers requires to add a mean pooling operation to obtain a sentence embedding.
5
+
6
+ ```python
7
+ import torch
8
+ from transformers import AutoTokenizer, AutoModel
9
+
10
+ tokenizer = AutoTokenizer.from_pretrained('facebook/contriever-msmarco')
11
+ model = AutoModel.from_pretrained('facebook/contriever-msmarco')
12
+
13
+ sentences = [
14
+ "Where was Marie Curie born?",
15
+ "Maria Sklodowska, later known as Marie Curie, was born on November 7, 1867.",
16
+ "Born in Paris on 15 May 1859, Pierre Curie was the son of Eugène Curie, a doctor of French Catholic origin from Alsace."
17
+ ]
18
+
19
+ # Apply tokenizer
20
+ inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
21
+
22
+ # Compute token embeddings
23
+ outputs = model(**inputs)
24
+
25
+ # Mean pooling
26
+ def mean_pooling(token_embeddings, mask):
27
+ token_embeddings = token_embeddings.masked_fill(~mask[..., None].bool(), 0.)
28
+ sentence_embeddings = token_embeddings.sum(dim=1) / mask.sum(dim=1)[..., None]
29
+ return sentence_embeddings
30
+ embeddings = mean_pooling(outputs[0], inputs['attention_mask'])
31
+ ```