akutuzov commited on
Commit
6b4e281
1 Parent(s): 9963332

Original NorBERT commit

Browse files
Files changed (6) hide show
  1. README.md +16 -0
  2. config.json +16 -0
  3. pytorch_model.bin +3 -0
  4. tokenizer.json +0 -0
  5. tokenizer_config.json +3 -0
  6. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: no
3
+ ---
4
+
5
+ ## Quickstart
6
+
7
+ **Release 1.0** (January 13, 2021)
8
+
9
+ Download the models here:
10
+
11
+ * Cased Norwegian BERT Base: [215.zip](http://vectors.nlpl.eu/repository/20/215.zip)
12
+
13
+ More about NorBERT: http://norlm.nlpl.eu/
14
+
15
+ The model was trained by the [Language Technology Group](https://www.mn.uio.no/ifi/english/research/groups/ltg/) at the University of Oslo.
16
+ The computations were performed on resources provided by UNINETT Sigma2 - the National Infrastructure for High Performance Computing and Data Storage in Norway.
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "hidden_act": "gelu",
7
+ "hidden_dropout_prob": 0.1,
8
+ "hidden_size": 768,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 3072,
11
+ "max_position_embeddings": 512,
12
+ "num_attention_heads": 12,
13
+ "num_hidden_layers": 12,
14
+ "type_vocab_size": 2,
15
+ "vocab_size": 32922
16
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d97a2135f74ed0ade9df9b70e1e0fb76961a83121e998c44c9efbf5bfa28e6fa
3
+ size 447894300
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "do_lower_case": false
3
+ }
vocab.txt ADDED
Binary file (218 kB). View file