sarnikowski commited on
Commit
f3862af
1 Parent(s): 7add585

release: v0.1.0

Browse files
README.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: da
3
+ license: cc-by-4.0
4
+ ---
5
+
6
+ # Danish ELECTRA small (cased)
7
+
8
+ An [ELECTRA](https://arxiv.org/abs/2003.10555) model pretrained on a custom Danish corpus (~17.5gb).
9
+ For details regarding data sources and training procedure, along with benchmarks on downstream tasks, go to: https://github.com/sarnikowski/danish_transformers/tree/main/electra
10
+
11
+ ## Usage
12
+
13
+ ```python
14
+ from transformers import AutoTokenizer, AutoModel
15
+
16
+ tokenizer = AutoTokenizer.from_pretrained("sarnikowski/electra-small-generator-da-256-cased")
17
+ model = AutoModel.from_pretrained("sarnikowski/electra-small-generator-da-256-cased")
18
+ ```
19
+
20
+ ## Questions?
21
+
22
+ If you have any questions feel free to open an issue in the [danish_transformers](https://github.com/sarnikowski/danish_transformers) repository, or send an email to p.sarnikowski@gmail.com
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ElectraForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "embedding_size": 128,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 64,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 256,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "electra",
15
+ "num_attention_heads": 1,
16
+ "num_hidden_layers": 12,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "summary_activation": "gelu",
20
+ "summary_last_dropout": 0.1,
21
+ "summary_type": "first",
22
+ "summary_use_proj": true,
23
+ "transformers_version": "4.2.2",
24
+ "type_vocab_size": 2,
25
+ "vocab_size": 28995
26
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af3c9ca7bda3b207e738eb817dbae0998bc276b91080b28268716ad615374e96
3
+ size 17742238
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
1
+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
tf_model.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:979f1fe898ca17db278a27c873720fbe482632a30075ac199da498c15fae3cef
3
+ size 33084756
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
1
+ {"do_lower_case": false}
vocab.txt ADDED
The diff for this file is too large to render. See raw diff