File size: 1,965 Bytes
126cc8c
 
 
08a5852
823980a
126cc8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
959f613
 
 
91e2009
 
 
 
 
 
 
 
 
 
 
959f613
 
126cc8c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language: 
  - en
pipeline_tag: sentence-similarity
tags:
- Pytorch
- Sentence Transformers
- Transformers
license: "apache-2.0"
---

# Twitter4SSE

This model maps texts to 768 dimensional dense embeddings that encode semantic similarity. 
It was trained with Multiple Negatives Ranking Loss (MNRL) on a Twitter dataset. 
It was initialized from [BERTweet](https://huggingface.co/vinai/bertweet-base) and trained with [Sentence-transformers](https://www.sbert.net/). 

## Usage

The model is easier to use with sentence-trainsformers library

```
pip install -U sentence-transformers
```

```
from sentence_transformers import SentenceTransformer
sentences = ["This is the first tweet", "This is the second tweet"]

model = SentenceTransformer('digio/Twitter4SSE')
embeddings = model.encode(sentences)
print(embeddings)
```


Without sentence-transfomer library, please refer to [this repository](https://huggingface.co/sentence-transformers) for detailed instructions on how to use Sentence Transformers on Huggingface. 

## Citing & Authors

The official paper [Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings](https://arxiv.org/abs/2110.02030) will be presented at EMNLP 2021. Further details will be available soon. 

```
@inproceedings{di-giovanni-brambilla-2021-exploiting,
    title = "Exploiting {T}witter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings",
    author = "Di Giovanni, Marco  and
      Brambilla, Marco",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.780",
    pages = "9902--9910",
}
```

The official code is available on [GitHub](https://github.com/marco-digio/Twitter4SSE)