ilos-vigil
/

bigbird-small-indonesian

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

ilos-vigil commited on Jan 18, 2023

Commit

c57096b

•

1 Parent(s): 3f0d9f8

Update README.md

Files changed (1) hide show

README.md +11 -6

README.md CHANGED Viewed

@@ -2,12 +2,13 @@
 language: id
 license: mit
 datasets:
-  - oscar
-  - wikipedia
-  - id_newspapers_2018
 widget:
-  - text: "Saya [MASK] makan nasi goreng."
-  - text: "Kucing itu sedang bermain dengan [MASK]."
 ---
 # Indonesian small BigBird model
@@ -16,6 +17,10 @@ widget:
 Source code to create this model is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
 ## Model Description
 This **cased** model has been pretrained with Masked LM objective. It has ~30M parameters and was pretrained with 8 epoch/51474 steps with 2.078 eval loss (7.988 perplexity). Architecture of this model is shown in the configuration snippet below. The tokenizer was trained with whole dataset with 30K vocabulary size.
@@ -159,4 +164,4 @@ The model achieve the following result during training evaluation.
 | 5     | 32187 | 2.097      | 8.141            |
 | 6     | 38616 | 2.087      | 8.061            |
 | 7     | 45045 | 2.081      | 8.012            |
-| 8     | 51474 | 2.078      | 7.988            |

 language: id
 license: mit
 datasets:
+- oscar
+- wikipedia
+- id_newspapers_2018
 widget:
+- text: Saya [MASK] makan nasi goreng.
+- text: Kucing itu sedang bermain dengan [MASK].
+pipeline_tag: fill-mask
 ---
 # Indonesian small BigBird model
 Source code to create this model is available at [https://github.com/ilos-vigil/bigbird-small-indonesian](https://github.com/ilos-vigil/bigbird-small-indonesian).
+## Downstream Task
+* NLI/ZSC: [ilos-vigil/bigbird-small-indonesian-nli](https://huggingface.co/ilos-vigil/bigbird-small-indonesian-nli)
 ## Model Description
 This **cased** model has been pretrained with Masked LM objective. It has ~30M parameters and was pretrained with 8 epoch/51474 steps with 2.078 eval loss (7.988 perplexity). Architecture of this model is shown in the configuration snippet below. The tokenizer was trained with whole dataset with 30K vocabulary size.
 | 5     | 32187 | 2.097      | 8.141            |
 | 6     | 38616 | 2.087      | 8.061            |
 | 7     | 45045 | 2.081      | 8.012            |
+| 8     | 51474 | 2.078      | 7.988            |