File size: 1,308 Bytes
88a1c2b
 
4ca9be5
 
87eeba4
 
 
 
 
 
 
 
 
39d24ba
4ca9be5
 
 
 
 
 
88a1c2b
4ca9be5
 
 
 
 
fa320c2
4ca9be5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65ff1ff
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: mit
language:
- en
- ar
- fr
- de
- pt
- it
- es
- zh
- ja
- ko
pipeline_tag: feature-extraction
tags:
- sentiment-analysis
- text-classification
- generic
- sentiment-classification
- multilingual
---

## Model

Base version of e5-multilingual finetunned on an annotated subset of mC4 (multilingual C4). This model provide generic embedding for sentiment analysis. Embeddings can be used out of the box or fine tune on specific datasets. 

Blog post: https://www.numind.ai/blog/creating-task-specific-foundation-models-with-gpt-4

## Usage

Below is an example to encode text and get embedding.

```python
import torch
from transformers import AutoTokenizer, AutoModel


model = AutoModel.from_pretrained("Numind/e5-multilingual-sentiment_analysis")
tokenizer = AutoTokenizer.from_pretrained("Numind/e5-multilingual-sentiment_analysis")
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

size = 256
text = "This movie is amazing"

encoding = tokenizer(
    text,
    truncation=True, 
    padding='max_length', 
    max_length= size,
)

emb = model(
      torch.reshape(torch.tensor(encoding.input_ids),(1,len(encoding.input_ids))).to(device),output_hidden_states=True
).hidden_states[-1].cpu().detach()

embText = torch.mean(emb,axis = 1)
```