File size: 3,157 Bytes
effc10e
 
 
4e5f4b6
effc10e
 
51f300d
c39ea9d
4e5f4b6
1f9b5e3
a974e8f
06626f3
 
a974e8f
 
 
 
 
 
 
 
27979bb
8122c96
27979bb
8d529a8
27979bb
 
 
 
 
8122c96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27979bb
8122c96
27979bb
 
8122c96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27979bb
8122c96
 
 
 
 
 
27979bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
03a86f1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
language:
- en
license: apache-2.0
base_model:
- FacebookAI/roberta-base
pipeline_tag: token-classification
library_name: transformers
---

# Training 
This model is designed for token classification tasks, enabling it to extract aspect terms and predict the sentiment polarity associated with the extracted aspect terms. 
The extracted aspect terms will be the span(s) from the input text on which a sentiment is being expressed. 

## Datasets
This model has been trained on the following datasets:

1. Aspect Based Sentiment Analysis SemEval Shared Tasks ([2014](https://aclanthology.org/S14-2004/), [2015](https://aclanthology.org/S15-2082/), [2016](https://aclanthology.org/S16-1002/))
2. Multi-Aspect Multi-Sentiment [MAMS](https://aclanthology.org/D19-1654/)

# Use

* Making token level inferences with Auto classes
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
model_id = "gauneg/roberta-base-absa-ate-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)


# the sequence of labels used during training
label = {"B-neu": 1, "I-neu": 2, "O": 0, "B-neg": 4, "B-con": 5, "I-pos": 6, "B-pos": 7, "I-con": 8, "I-neg": 9, "X": -100}
id2lab = {idx: lab for lab, idx in labels.items()}
lab2id = {lab: idx for lab, idx in labels.items()}


# making one prediction at a time (should be padded/batched and truncated for efficiency)
text_input = "Been here a few times and food has always been good but service really suffers when it gets crowded."
tok_inputs = tokenizer(text_input, return_tensors="pt")


y_pred = model(**tok_inputs) # predicting the logits

y_pred_fin = y_pred.logits.argmax(dim=-1)[0] # selecting the most favoured labels for each token from the logits

decoded_pred = [id2lab[logx.item()] for logx in y_pred_fin]


## displaying the input tokens with predictions and skipping <s> and </s> tokens at the beginning and the end respectively

tok_levl_pred = list(zip(tokenizer.convert_ids_to_tokens(tok_inputs['input_ids'][0]), decoded_pred))[1:-1]

```

* results in `tok_level_pred` variable

```bash
[('Be', 'O'),
 ('en', 'O'),
 ('Ġhere', 'O'),
 ('Ġa', 'O'),
 ('Ġfew', 'O'),
 ('Ġtimes', 'O'),
 ('Ġand', 'O'),
 ('Ġfood', 'B-pos'),
 ('Ġhas', 'O'),
 ('Ġalways', 'O'),
 ('Ġbeen', 'O'),
 ('Ġgood', 'O'),
 ('Ġbut', 'O'),
 ('Ġservice', 'B-neg'),
 ('Ġreally', 'O'),
 ('Ġsuffers', 'O'),
 ('Ġwhen', 'O'),
 ('Ġit', 'O'),
 ('Ġgets', 'O'),
 ('Ġcrowded', 'O'),
 ('.', 'O')]
```

# OR

* Using the pipeline directly for end-to-end inference:
```python
from transformers import pipeline

ate_sent_pipeline = pipeline(task='ner', 
                  aggregation_strategy='simple',
                  model="gauneg/roberta-base-absa-ate-sentiment")

text_input = "Been here a few times and food has always been good but service really suffers when it gets crowded."
ate_sent_pipeline(text_input)
```
* pipeline output:
```bash
[{'entity_group': 'pos',
  'score': 0.8447307,
  'word': ' food',
  'start': 26,
  'end': 30},
 {'entity_group': 'neg',
  'score': 0.81927896,
  'word': ' service',
  'start': 56,
  'end': 63}]

```