tyqiangz commited on
Commit
40e087c
1 Parent(s): 69824c8

Updated README.md with easier instructions to classify text

Browse files
Files changed (1) hide show
  1. README.md +20 -20
README.md CHANGED
@@ -12,42 +12,42 @@ datasets:
12
 
13
  # IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset
14
 
15
- Finetuned the IndoBERT-Lite Large Model (phase2 - uncased) model following the procedues stated in the paper [IndoNLU: Benchmark and Resources for Evaluating Indonesian
16
  Natural Language Understanding](https://arxiv.org/pdf/2009.05387.pdf).
17
 
18
- Finetuning hyperparameters:
19
  - learning rate: 2e-5
20
  - batch size: 16
21
  - no. of epochs: 5
22
  - max sequence length: 512
23
  - random seed: 42
24
 
 
 
 
 
 
 
 
 
 
 
25
  ## How to use
26
 
27
  ### Load model and tokenizer
28
 
29
  ```python
30
- from transformers import BertTokenizer, AutoModelForSequenceClassification
31
- import torch
32
- import torch.nn.functional as F
33
-
34
- tokenizer = BertTokenizer.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
35
- model = AutoModel.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
36
-
37
  text = "Penyakit koronavirus 2019"
38
-
39
- index_to_word = {0: 'positive', 1: 'neutral', 2: 'negative'}
40
-
41
- subwords = tokenizer.encode(text, add_special_tokens=True)
42
- subwords = torch.LongTensor(subwords).view(1, -1).to(model.device)
43
-
44
- logits = model(subwords)[0]
45
- label = torch.topk(logits, k=1, dim=-1)[1].squeeze().item()
46
-
47
- print(index_to_word[label])
48
 
49
  """
50
  Output:
51
- 'negative'
 
 
52
  """
53
  ```
 
12
 
13
  # IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset
14
 
15
+ Finetuned the IndoBERT-Lite Large Model (phase2 - uncased) model on the IndoNLU SmSA dataset following the procedues stated in the paper [IndoNLU: Benchmark and Resources for Evaluating Indonesian
16
  Natural Language Understanding](https://arxiv.org/pdf/2009.05387.pdf).
17
 
18
+ **Finetuning hyperparameters:**
19
  - learning rate: 2e-5
20
  - batch size: 16
21
  - no. of epochs: 5
22
  - max sequence length: 512
23
  - random seed: 42
24
 
25
+ **Classes:**
26
+ - 0: positive
27
+ - 1: neutral
28
+ - 2: negative
29
+
30
+ Validation accuracy: 0.94
31
+ Validation F1: 0.91
32
+ Validation Recall: 0.91
33
+ Validation Precision: 0.93
34
+
35
  ## How to use
36
 
37
  ### Load model and tokenizer
38
 
39
  ```python
40
+ from transformers import pipeline
41
+ classifier = pipeline("text-classification",
42
+ model='tyqiangz/indobert-lite-large-p2-smsa', return_all_scores=True)
 
 
 
 
43
  text = "Penyakit koronavirus 2019"
44
+ prediction = classifier(text)
45
+ prediction
 
 
 
 
 
 
 
 
46
 
47
  """
48
  Output:
49
+ [[{'label': 'positive', 'score': 0.0006000096909701824},
50
+ {'label': 'neutral', 'score': 0.01223431620746851},
51
+ {'label': 'negative', 'score': 0.987165629863739}]]
52
  """
53
  ```