ilos-vigil commited on
Commit
c57096b
1 Parent(s): 3f0d9f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -6
README.md CHANGED
@@ -2,12 +2,13 @@
2
  language: id
3
  license: mit
4
  datasets:
5
- - oscar
6
- - wikipedia
7
- - id_newspapers_2018
8
  widget:
9
- - text: "Saya [MASK] makan nasi goreng."
10
- - text: "Kucing itu sedang bermain dengan [MASK]."
 
11
  ---
12
 
13
  # Indonesian small BigBird model
@@ -16,6 +17,10 @@ widget:
16
 
17
  Source code to create this model is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
18
 
 
 
 
 
19
  ## Model Description
20
 
21
  This **cased** model has been pretrained with Masked LM objective. It has ~30M parameters and was pretrained with 8 epoch/51474 steps with 2.078 eval loss (7.988 perplexity). Architecture of this model is shown in the configuration snippet below. The tokenizer was trained with whole dataset with 30K vocabulary size.
@@ -159,4 +164,4 @@ The model achieve the following result during training evaluation.
159
  | 5 | 32187 | 2.097 | 8.141 |
160
  | 6 | 38616 | 2.087 | 8.061 |
161
  | 7 | 45045 | 2.081 | 8.012 |
162
- | 8 | 51474 | 2.078 | 7.988 |
 
2
  language: id
3
  license: mit
4
  datasets:
5
+ - oscar
6
+ - wikipedia
7
+ - id_newspapers_2018
8
  widget:
9
+ - text: Saya [MASK] makan nasi goreng.
10
+ - text: Kucing itu sedang bermain dengan [MASK].
11
+ pipeline_tag: fill-mask
12
  ---
13
 
14
  # Indonesian small BigBird model
 
17
 
18
  Source code to create this model is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
19
 
20
+ ## Downstream Task
21
+
22
+ * NLI/ZSC: [ilos-vigil/bigbird-small-indonesian-nli](https://huggingface.co/ilos-vigil/bigbird-small-indonesian-nli)
23
+
24
  ## Model Description
25
 
26
  This **cased** model has been pretrained with Masked LM objective. It has ~30M parameters and was pretrained with 8 epoch/51474 steps with 2.078 eval loss (7.988 perplexity). Architecture of this model is shown in the configuration snippet below. The tokenizer was trained with whole dataset with 30K vocabulary size.
 
164
  | 5 | 32187 | 2.097 | 8.141 |
165
  | 6 | 38616 | 2.087 | 8.061 |
166
  | 7 | 45045 | 2.081 | 8.012 |
167
+ | 8 | 51474 | 2.078 | 7.988 |