Initial models

Files changed (4) hide show

README.md ADDED Viewed

+ ---
+inference: false
+language: id
+---
+# IndoConvBERT Base Model
+IndoConvBERT is a ConvBERT model pretrained on Indo4B.
+## Pretraining details
+We follow a different training procedure: instead of using a two-phase approach, that pre-trains the model for 90% with 128 sequence length and 10% with 512 sequence length, we pre-train the model with 512 sequence length for 1M steps on a v3-8 TPU.
+The current version of the model is trained on Indo4B and small Twitter dump.
+## Acknowledgement
+Big thanks to TFRC (TensorFlow Research Cloud) for providing free TPU.

config.json ADDED Viewed

+{
+  "_name_or_path": "IndoConvBERT-base/",
+  "architectures": [
+    "ConvBertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "conv_kernel_size": 9,
+  "embedding_size": 768,
+  "eos_token_id": 2,
+  "head_ratio": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "convbert",
+  "num_attention_heads": 12,
+  "num_groups": 1,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "transformers_version": "4.4.0.dev0",
+  "type_vocab_size": 2,
+  "vocab_size": 30522
+}

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c626393b3a07718ab46ef8f426c46e7d6bb529fcdfc5a5195b040066985c2d33
+size 422837461

tf_model.h5 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:742b00c619f5bacde6cfa70815035644c54f0c15e8f93a52c929ac13574d8425
+size 423072408