tyqiangz commited on
Commit
69824c8
1 Parent(s): cfe845b

Updated README.md with instructions to classify text and, specified class labels in config.json

Browse files
Files changed (2) hide show
  1. README.md +32 -3
  2. config.json +6 -6
README.md CHANGED
@@ -5,20 +5,49 @@ tags:
5
  - indobenchmark
6
  - indonlu
7
  license: mit
8
- inference: false
9
  datasets:
10
  - Indo4B
11
- - IndoNLU (SmSA)
12
  ---
13
 
14
  # IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset
15
 
 
 
 
 
 
 
 
 
 
 
16
  ## How to use
17
 
18
  ### Load model and tokenizer
19
 
20
  ```python
21
- from transformers import BertTokenizer, AutoModel
 
 
 
22
  tokenizer = BertTokenizer.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
23
  model = AutoModel.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ```
5
  - indobenchmark
6
  - indonlu
7
  license: mit
8
+ inference: true
9
  datasets:
10
  - Indo4B
 
11
  ---
12
 
13
  # IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset
14
 
15
+ Finetuned the IndoBERT-Lite Large Model (phase2 - uncased) model following the procedues stated in the paper [IndoNLU: Benchmark and Resources for Evaluating Indonesian
16
+ Natural Language Understanding](https://arxiv.org/pdf/2009.05387.pdf).
17
+
18
+ Finetuning hyperparameters:
19
+ - learning rate: 2e-5
20
+ - batch size: 16
21
+ - no. of epochs: 5
22
+ - max sequence length: 512
23
+ - random seed: 42
24
+
25
  ## How to use
26
 
27
  ### Load model and tokenizer
28
 
29
  ```python
30
+ from transformers import BertTokenizer, AutoModelForSequenceClassification
31
+ import torch
32
+ import torch.nn.functional as F
33
+
34
  tokenizer = BertTokenizer.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
35
  model = AutoModel.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
36
+
37
+ text = "Penyakit koronavirus 2019"
38
+
39
+ index_to_word = {0: 'positive', 1: 'neutral', 2: 'negative'}
40
+
41
+ subwords = tokenizer.encode(text, add_special_tokens=True)
42
+ subwords = torch.LongTensor(subwords).view(1, -1).to(model.device)
43
+
44
+ logits = model(subwords)[0]
45
+ label = torch.topk(logits, k=1, dim=-1)[1].squeeze().item()
46
+
47
+ print(index_to_word[label])
48
+
49
+ """
50
+ Output:
51
+ 'negative'
52
+ """
53
  ```
config.json CHANGED
@@ -15,17 +15,17 @@
15
  "hidden_dropout_prob": 0,
16
  "hidden_size": 1024,
17
  "id2label": {
18
- "0": "LABEL_0",
19
- "1": "LABEL_1",
20
- "2": "LABEL_2"
21
  },
22
  "initializer_range": 0.02,
23
  "inner_group_num": 1,
24
  "intermediate_size": 4096,
25
  "label2id": {
26
- "LABEL_0": 0,
27
- "LABEL_1": 1,
28
- "LABEL_2": 2
29
  },
30
  "layer_norm_eps": 1e-12,
31
  "max_position_embeddings": 512,
15
  "hidden_dropout_prob": 0,
16
  "hidden_size": 1024,
17
  "id2label": {
18
+ "0": "positive",
19
+ "1": "neutral",
20
+ "2": "negative"
21
  },
22
  "initializer_range": 0.02,
23
  "inner_group_num": 1,
24
  "intermediate_size": 4096,
25
  "label2id": {
26
+ "positive": 0,
27
+ "neutral": 1,
28
+ "negative": 2
29
  },
30
  "layer_norm_eps": 1e-12,
31
  "max_position_embeddings": 512,