release: v0.1.0

Files changed (7) hide show

README.md ADDED Viewed

+---
+language: da
+license: cc-by-4.0
+---
+# Danish ELECTRA small (cased)
+An [ELECTRA](https://arxiv.org/abs/2003.10555) model pretrained on a custom Danish corpus (~17.5gb).
+For details regarding data sources and training procedure, along with benchmarks on downstream tasks, go to: https://github.com/sarnikowski/danish_transformers/tree/main/electra
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("sarnikowski/electra-small-generator-da-256-cased")
+model = AutoModel.from_pretrained("sarnikowski/electra-small-generator-da-256-cased")
+```
+## Questions?
+If you have any questions feel free to open an issue in the [danish_transformers](https://github.com/sarnikowski/danish_transformers) repository, or send an email to p.sarnikowski@gmail.com

config.json ADDED Viewed

+{
+  "architectures": [
+    "ElectraForMaskedLM"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "embedding_size": 128,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 64,
+  "initializer_range": 0.02,
+  "intermediate_size": 256,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "electra",
+  "num_attention_heads": 1,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "summary_activation": "gelu",
+  "summary_last_dropout": 0.1,
+  "summary_type": "first",
+  "summary_use_proj": true,
+  "transformers_version": "4.2.2",
+  "type_vocab_size": 2,
+  "vocab_size": 28995
+}

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:af3c9ca7bda3b207e738eb817dbae0998bc276b91080b28268716ad615374e96
+size 17742238

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

tf_model.h5 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:979f1fe898ca17db278a27c873720fbe482632a30075ac199da498c15fae3cef
+size 33084756

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"do_lower_case": false}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff