File size: 2,099 Bytes
48148d2
c75d963
48148d2
 
b337469
 
a304601
48148d2
 
842bfa3
48148d2
 
 
f5fa686
48148d2
 
 
f5fa686
48148d2
 
f5fa686
 
48148d2
 
 
569f485
 
 
48148d2
 
 
6b24c05
48148d2
 
 
6b24c05
48148d2
 
 
 
 
f5fa686
48148d2
6b24c05
48148d2
 
 
 
 
 
 
 
 
 
 
 
f5fa686
48148d2
 
f5fa686
48148d2
569f485
48148d2
f5fa686
48148d2
 
f5fa686
48148d2
 
 
6b24c05
48148d2
 
 
 
 
 
 
 
f5fa686
48148d2
f5fa686
48148d2
f5fa686
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- causal-lm
license:
- cc-by-sa-4.0
---

# TODO: Name of Model

TODO: Description

## Model Description
TODO: Add relevant content

(0) Base Transformer Type: RobertaModel

(1) Pooling mean


## Usage (Sentence-Transformers)

Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:

```
pip install -U sentence-transformers
```

Then you can use the model like this:

```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence"]

model = SentenceTransformer(TODO)
embeddings = model.encode(sentences)
print(embeddings)
```


## Usage (HuggingFace Transformers)

```python
from transformers import AutoTokenizer, AutoModel
import torch

# The next step is optional if you want your own pooling function.
# Max Pooling - Take the max value over time for every dimension. 
def max_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    token_embeddings[input_mask_expanded == 0] = -1e9  # Set padding tokens to large negative value
    max_over_time = torch.max(token_embeddings, 1)[0]
    return max_over_time

# Sentences we want sentence embeddings for
sentences = ['This is an example sentence']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(TODO)
model = AutoModel.from_pretrained(TODO)

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt'))

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, max pooling.
sentence_embeddings = max_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)
```



## TODO: Training Procedure

## TODO: Evaluation Results

## TODO: Citing & Authors