esakrissa commited on
Commit
82c97f5
1 Parent(s): c5b8b41

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -5
README.md CHANGED
@@ -5,6 +5,10 @@ tags:
5
  model-index:
6
  - name: indobert-squad-trained
7
  results: []
 
 
 
 
8
  ---
9
 
10
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -16,19 +20,30 @@ This model is a fine-tuned version of [indolem/indobert-base-uncased](https://hu
16
  It achieves the following results on the evaluation set:
17
  - Loss: 1.8025
18
 
19
- ## Model description
 
 
 
 
 
20
 
21
- More information needed
22
 
23
- ## Intended uses & limitations
24
 
25
- More information needed
26
 
27
  ## Training and evaluation data
28
 
29
- More information needed
 
 
 
 
 
 
30
 
31
  ## Training procedure
 
32
 
33
  ### Training hyperparameters
34
 
@@ -49,6 +64,48 @@ The following hyperparameters were used during training:
49
  | 1.1716 | 2.0 | 16404 | 1.8555 |
50
  | 1.2909 | 3.0 | 24606 | 1.8025 |
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ### Framework versions
54
 
 
5
  model-index:
6
  - name: indobert-squad-trained
7
  results: []
8
+ widget:
9
+ - text: Di daerah mana Ubud berada?
10
+ context: Ubud adalah sebuah desa adat sekaligus menjadi destinasi wisata di daerah kabupaten Gianyar, pulau Bali, Indonesia. Ubud terutama terkenal diantara para wisatawan mancanegara karena terletak di antara sawah dan hutan yang berjurang-jurang yang membuat pemandangan alam sangat indah. Selain itu, Ubud dikenal karena seni dan budaya yang berkembang sangat pesat dan maju.
11
+
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
20
  It achieves the following results on the evaluation set:
21
  - Loss: 1.8025
22
 
23
+ ## IndoBERT
24
+
25
+ [IndoBERT](https://huggingface.co/indolem/indobert-base-uncased) is the Indonesian version of BERT model. We train the model using over 220M words, aggregated from three main sources:
26
+ - Indonesian Wikipedia (74M words)
27
+ - news articles from Kompas, Tempo (Tala et al., 2003), and Liputan6 (55M words in total)
28
+ - an Indonesian Web Corpus (Medved and Suchomel, 2017) (90M words).
29
 
30
+ We trained the model for 2.4M steps (180 epochs) with the final perplexity over the development set being 3.97 (similar to English BERT-base).
31
 
32
+ This IndoBERT was used to examine IndoLEM - an Indonesian benchmark that comprises of seven tasks for the Indonesian language, spanning morpho-syntax, semantics, and discourse.
33
 
 
34
 
35
  ## Training and evaluation data
36
 
37
+ SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
38
+
39
+ | Dataset | Split | # samples |
40
+ | -------- | ----- | --------- |
41
+ | SQuAD2.0 | train | 130k |
42
+ | SQuAD2.0 | eval | 12.3k |
43
+
44
 
45
  ## Training procedure
46
+ The model was trained on a Tesla T4 GPU and 12GB of RAM.
47
 
48
  ### Training hyperparameters
49
 
 
64
  | 1.1716 | 2.0 | 16404 | 1.8555 |
65
  | 1.2909 | 3.0 | 24606 | 1.8025 |
66
 
67
+ | Metric | # Value |
68
+ | ------ | --------- |
69
+ | **EM** | **52.17** |
70
+ | **F1** | **69.22** |
71
+
72
+
73
+ ## Pipeline
74
+ ```py
75
+ from transformers import pipeline
76
+
77
+ qa_pipeline = pipeline(
78
+ "question-answering",
79
+ model="esakrissa/indobert-squad-trained",
80
+ tokenizer="esakrissa/indobert-squad-trained"
81
+ )
82
+
83
+ qa_pipeline({
84
+ 'context': """Ubud adalah sebuah desa adat sekaligus menjadi destinasi wisata di daerah kabupaten Gianyar, pulau Bali, Indonesia.
85
+
86
+ Ubud terutama terkenal diantara para wisatawan mancanegara karena terletak di antara sawah dan hutan yang berjurang-jurang yang membuat pemandangan alam sangat indah. Selain itu, Ubud dikenal karena seni dan budaya yang berkembang sangat pesat dan maju. Denyut nadi kehidupan masyarakat Ubud tidak bisa dilepaskan dari kesenian. Di sini banyak pula terdapat galeri-galeri seni, serta arena pertunjukan musik dan tari yang digelar setiap malam secara bergantian di segala penjuru desa.
87
+
88
+ Sudah sejak tahun 1920-an, Ubud terkenal di antara wisatawan barat. Kala itu pelukis Jerman; Walter Spies dan pelukis Belanda; Rudolf Bonnet menetap di sana. Mereka dibantu oleh Tjokorda Gde Agung Sukawati, dari Puri Agung Ubud. Sekarang karya mereka bisa dilihat di Museum Puri Lukisan, Ubud.""",
89
+ 'question': "Sejak kapan ubud terkenal di antara wisatawan barat?"
90
+ })
91
+ ```
92
+ *output:*
93
+ ```py
94
+ {
95
+ 'answer': '1920-an'
96
+ 'start': 644,
97
+ 'end': 651,
98
+ 'score': 0.8787403106689453,
99
+ }
100
+ ```
101
+
102
+ [Github](https://github.com/esakrissa/question-answering)
103
+
104
+ ## Demo
105
+
106
+
107
+ ### Reference
108
+ <a id="1">[1]</a>Fajri Koto and Afshin Rahimi and Jey Han Lau and Timothy Baldwin. 2020. IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. Proceedings of the 28th COLING.
109
 
110
  ### Framework versions
111