tyqiangz
/

indobert-lite-large-p2-smsa

@@ -5,20 +5,49 @@ tags:
   - indobenchmark
   - indonlu
 license: mit
-inference: false
 datasets:
   - Indo4B
-  - IndoNLU (SmSA)
 ---
 # IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset
 ## How to use
 ### Load model and tokenizer
 ```python
-from transformers import BertTokenizer, AutoModel
 tokenizer = BertTokenizer.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
 model = AutoModel.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
 ```

   - indobenchmark
   - indonlu
 license: mit
+inference: true
 datasets:
   - Indo4B
 ---
 # IndoBERT-Lite Large Model (phase2 - uncased) Finetuned on IndoNLU SmSA dataset
+Finetuned the IndoBERT-Lite Large Model (phase2 - uncased) model following the procedues stated in the paper [IndoNLU: Benchmark and Resources for Evaluating Indonesian
+Natural Language Understanding](https://arxiv.org/pdf/2009.05387.pdf).
+Finetuning hyperparameters:
+- learning rate: 2e-5
+- batch size: 16
+- no. of epochs: 5
+- max sequence length: 512
+- random seed: 42
 ## How to use
 ### Load model and tokenizer
 ```python
+from transformers import BertTokenizer, AutoModelForSequenceClassification
+import torch
+import torch.nn.functional as F
 tokenizer = BertTokenizer.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
 model = AutoModel.from_pretrained("tyqiangz/indobert-lite-large-p2-smsa")
+text = "Penyakit koronavirus 2019"
+index_to_word = {0: 'positive', 1: 'neutral', 2: 'negative'}
+subwords = tokenizer.encode(text, add_special_tokens=True)
+subwords = torch.LongTensor(subwords).view(1, -1).to(model.device)
+logits = model(subwords)[0]
+label = torch.topk(logits, k=1, dim=-1)[1].squeeze().item()
+print(index_to_word[label])
+"""
+Output:
+'negative'
+"""
 ```

config.json CHANGED Viewed

@@ -15,17 +15,17 @@
   "hidden_dropout_prob": 0,
   "hidden_size": 1024,
   "id2label": {
-    "0": "LABEL_0",
-    "1": "LABEL_1",
-    "2": "LABEL_2"
   },
   "initializer_range": 0.02,
   "inner_group_num": 1,
   "intermediate_size": 4096,
   "label2id": {
-    "LABEL_0": 0,
-    "LABEL_1": 1,
-    "LABEL_2": 2
   },
   "layer_norm_eps": 1e-12,
   "max_position_embeddings": 512,

   "hidden_dropout_prob": 0,
   "hidden_size": 1024,
   "id2label": {
+    "0": "positive",
+    "1": "neutral",
+    "2": "negative"
   },
   "initializer_range": 0.02,
   "inner_group_num": 1,
   "intermediate_size": 4096,
   "label2id": {
+    "positive": 0,
+    "neutral": 1,
+    "negative": 2
   },
   "layer_norm_eps": 1e-12,
   "max_position_embeddings": 512,