osanseviero HF staff commited on
Commit
48148d2
1 Parent(s): 17673ab
Files changed (2) hide show
  1. README.md +76 -0
  2. pytorch_model.bin +1 -1
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ ---
5
+
6
+ # TODO: Name of Model
7
+
8
+ TODO: Description
9
+
10
+ ## Model Description
11
+ TODO: Add relevant content
12
+
13
+ (0) Base Transformer Type: RobertaModel
14
+ (1) Pooling mean
15
+
16
+ ## Usage (Sentence-Transformers)
17
+
18
+ Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
19
+
20
+ ```pip install -U sentence-transformers```
21
+
22
+ Then you can use the model like this:
23
+
24
+ ```
25
+ from sentence_transformers import SentenceTransformer
26
+ sentences = ["This is an example sentence"]
27
+
28
+ model = SentenceTransformer(TODO)
29
+ embeddings = model.encode(sentences)
30
+ print(embeddings)
31
+ ```
32
+
33
+
34
+ ## Usage (HuggingFace Transformers)
35
+
36
+ ```
37
+ from transformers import AutoTokenizer, AutoModel
38
+ import torch
39
+
40
+ # The next step is optional if you want your own pooling function.
41
+ # Max Pooling - Take the max value over time for every dimension.
42
+ def max_pooling(model_output, attention_mask):
43
+ token_embeddings = model_output[0] #First element of model_output contains all token embeddings
44
+ input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
45
+ token_embeddings[input_mask_expanded == 0] = -1e9 # Set padding tokens to large negative value
46
+ max_over_time = torch.max(token_embeddings, 1)[0]
47
+ return max_over_time
48
+
49
+ #Sentences we want sentence embeddings for
50
+ sentences = ['This is an example sentence']
51
+
52
+ #Load model from HuggingFace Hub
53
+ tokenizer = AutoTokenizer.from_pretrained(TODO)
54
+ model = AutoModel.from_pretrained(TODO
55
+
56
+ #Tokenize sentences
57
+ encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt'))
58
+
59
+ #Compute token embeddings
60
+ with torch.no_grad():
61
+ model_output = model(**encoded_input)
62
+
63
+ #Perform pooling. In this case, max pooling.
64
+ sentence_embeddings = max_pooling(model_output, encoded_input['attention_mask'])
65
+
66
+ print("Sentence embeddings:")
67
+ print(sentence_embeddings)
68
+ ```
69
+
70
+
71
+
72
+ ## TODO: Training Procedure
73
+
74
+ ## TODO: Evaluation Results
75
+
76
+ ## TODO: Citing & Authors
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8facdf6e3cd67cce41c67e57acce8defdbcf81cd340d1474ebd4b612bdbb56a9
3
  size 328519167
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02be9d4693ea9f8c4ed5beeefd366ade0c27ccf6775c4b6f4ccd1628041b1262
3
  size 328519167