Updated README.md with easier instructions to classify text
Browse files
README.md
CHANGED
@@ -12,42 +12,42 @@ datasets:
|
|
12 |
|
13 |
# IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset
|
14 |
|
15 |
-
Finetuned the IndoBERT-Lite Large Model (phase2 - uncased) model following the procedues stated in the paper [IndoNLU: Benchmark and Resources for Evaluating Indonesian
|
16 |
Natural Language Understanding](https://arxiv.org/pdf/2009.05387.pdf).
|
17 |
|
18 |
-
Finetuning hyperparameters
|
19 |
- learning rate: 2e-5
|
20 |
- batch size: 16
|
21 |
- no. of epochs: 5
|
22 |
- max sequence length: 512
|
23 |
- random seed: 42
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
## How to use
|
26 |
|
27 |
### Load model and tokenizer
|
28 |
|
29 |
```python
|
30 |
-
from transformers import
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
tokenizer = BertTokenizer.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
|
35 |
-
model = AutoModel.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
|
36 |
-
|
37 |
text = "Penyakit koronavirus 2019"
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
subwords = tokenizer.encode(text, add_special_tokens=True)
|
42 |
-
subwords = torch.LongTensor(subwords).view(1, -1).to(model.device)
|
43 |
-
|
44 |
-
logits = model(subwords)[0]
|
45 |
-
label = torch.topk(logits, k=1, dim=-1)[1].squeeze().item()
|
46 |
-
|
47 |
-
print(index_to_word[label])
|
48 |
|
49 |
"""
|
50 |
Output:
|
51 |
-
'
|
|
|
|
|
52 |
"""
|
53 |
```
|
|
|
12 |
|
13 |
# IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset
|
14 |
|
15 |
+
Finetuned the IndoBERT-Lite Large Model (phase2 - uncased) model on the IndoNLU SmSA dataset following the procedues stated in the paper [IndoNLU: Benchmark and Resources for Evaluating Indonesian
|
16 |
Natural Language Understanding](https://arxiv.org/pdf/2009.05387.pdf).
|
17 |
|
18 |
+
**Finetuning hyperparameters:**
|
19 |
- learning rate: 2e-5
|
20 |
- batch size: 16
|
21 |
- no. of epochs: 5
|
22 |
- max sequence length: 512
|
23 |
- random seed: 42
|
24 |
|
25 |
+
**Classes:**
|
26 |
+
- 0: positive
|
27 |
+
- 1: neutral
|
28 |
+
- 2: negative
|
29 |
+
|
30 |
+
Validation accuracy: 0.94
|
31 |
+
Validation F1: 0.91
|
32 |
+
Validation Recall: 0.91
|
33 |
+
Validation Precision: 0.93
|
34 |
+
|
35 |
## How to use
|
36 |
|
37 |
### Load model and tokenizer
|
38 |
|
39 |
```python
|
40 |
+
from transformers import pipeline
|
41 |
+
classifier = pipeline("text-classification",
|
42 |
+
model='tyqiangz/indobert-lite-large-p2-smsa', return_all_scores=True)
|
|
|
|
|
|
|
|
|
43 |
text = "Penyakit koronavirus 2019"
|
44 |
+
prediction = classifier(text)
|
45 |
+
prediction
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
"""
|
48 |
Output:
|
49 |
+
[[{'label': 'positive', 'score': 0.0006000096909701824},
|
50 |
+
{'label': 'neutral', 'score': 0.01223431620746851},
|
51 |
+
{'label': 'negative', 'score': 0.987165629863739}]]
|
52 |
"""
|
53 |
```
|