release: v0.1.0

Files changed (7) hide show

README.md ADDED Viewed

+---
+language: da
+license: cc-by-4.0
+---
+# Danish ConvBERT small (cased)
+[ConvBERT](https://arxiv.org/abs/2008.02496) model pretrained on a custom Danish corpus (~17.5gb).
+For details regarding data sources and training procedure, along with benchmarks on downstream tasks, go to: https://github.com/sarnikowski/danish_transformers
+## Usage
+```python
+from transformers import ConvBertTokenizer, ConvBertModel
+tokenizer = ConvBertTokenizer.from_pretrained("sarnikowski/convbert-small-da-cased")
+model = ConvBertModel.from_pretrained("sarnikowski/convbert-small-da-cased")
+```
+## Questions?
+If you have any questions feel free to open an issue on the [danish_transformers](https://github.com/sarnikowski/danish_transformers) repository, or send an email to p.sarnikowski@gmail.com

config.json ADDED Viewed

+{
+  "_name_or_path": ".",
+  "architectures": [
+    "ConvBertForPreTraining"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "conv_kernel_size": 9,
+  "directionality": "bidi",
+  "embedding_size": 128,
+  "eos_token_id": 2,
+  "head_ratio": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 256,
+  "initializer_range": 0.02,
+  "intermediate_size": 1024,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "convbert",
+  "num_attention_heads": 4,
+  "num_groups": 1,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "transformers_version": "4.4.0.dev0",
+  "type_vocab_size": 2,
+  "vocab_size": 28995
+}

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ee5811a108a8f098e44faf91bd96f6e70351d5e4967210257fbd2a98ce6850f
+size 52183983

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

tf_model.h5 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:f55327b06240e4635212dde7ffb214acd782417fd8b81dbb5d8cfa453bc8f074
+size 52129664

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"do_lower_case": false}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff