File size: 2,099 Bytes
48148d2 c75d963 48148d2 b337469 3d99be3 48148d2 842bfa3 48148d2 f5fa686 48148d2 f5fa686 48148d2 f5fa686 48148d2 569f485 48148d2 6b24c05 48148d2 6b24c05 48148d2 f5fa686 48148d2 6b24c05 48148d2 f5fa686 48148d2 f5fa686 48148d2 569f485 48148d2 f5fa686 48148d2 f5fa686 48148d2 6b24c05 48148d2 f5fa686 48148d2 f5fa686 48148d2 f5fa686 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- causal-lm
license:
- cc-by-sa-4.0
---
# TODO: Name of Model
TODO: Description
## Model Description
TODO: Add relevant content
(0) Base Transformer Type: RobertaModel
(1) Pooling mean
## Usage (Sentence-Transformers)
Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
```
pip install -U sentence-transformers
```
Then you can use the model like this:
```python
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence"]
model = SentenceTransformer(TODO)
embeddings = model.encode(sentences)
print(embeddings)
```
## Usage (HuggingFace Transformers)
```python
from transformers import AutoTokenizer, AutoModel
import torch
# The next step is optional if you want your own pooling function.
# Max Pooling - Take the max value over time for every dimension.
def max_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
token_embeddings[input_mask_expanded == 0] = -1e9 # Set padding tokens to large negative value
max_over_time = torch.max(token_embeddings, 1)[0]
return max_over_time
# Sentences we want sentence embeddings for
sentences = ['This is an example sentence']
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(TODO)
model = AutoModel.from_pretrained(TODO)
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt'))
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, max pooling.
sentence_embeddings = max_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
```
## TODO: Training Procedure
## TODO: Evaluation Results
## TODO: Citing & Authors
|