initial commit

Files changed (8) hide show

README.md ADDED Viewed

+---
+language:
+- da
+tags:
+- ner
+- bert
+- pytorch
+- transformers
+license: CC BY-SA 4.0
+datasets:
+- DaNE
+metrics:
+- f1
+widget:
+- text: "Jens Peter Hansen kommer fra Danmark"
+---
+# BERT fine-tuned for Named Entity Recognition in Danish
+The model tags tokens (in Danish sentences) with named entity tags (BIO format) [PER, ORG, LOC, MISC].
+The pretrained language model used for fine-tuning is the [Danish BERT](https://github.com/certainlyio/nordic_bert) by BotXO.
+See the [DaNLP documentation](https://danlp-alexandra.readthedocs.io/en/latest/docs/tasks/ner.html#bert) for more details.
+Here is how to use the model:
+```python
+from transformers import BertTokenizer, BertForTokenClassification
+model = BertForTokenClassification.from_pretrained("DaNLP/da-bert-ner")
+tokenizer = BertTokenizer.from_pretrained("DaNLP/da-bert-ner")
+```
+## Training Data
+The model has been trained on the [DaNE](https://danlp-alexandra.readthedocs.io/en/latest/docs/datasets.html#dane).

added_tokens.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {}

config.json ADDED Viewed

+{
+  "_name_or_path": ".",
+  "architectures": [
+    "BertForTokenClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "directionality": "bidi",
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "O",
+    "1": "B-MISC",
+    "2": "I-MISC",
+    "3": "B-PER",
+    "4": "I-PER",
+    "5": "B-ORG",
+    "6": "I-ORG",
+    "7": "B-LOC",
+    "8": "I-LOC"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "B-LOC": 7,
+    "B-MISC": 1,
+    "B-ORG": 5,
+    "B-PER": 3,
+    "I-LOC": 8,
+    "I-MISC": 2,
+    "I-ORG": 6,
+    "I-PER": 4,
+    "O": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "output_past": true,
+  "pad_token_id": 0,
+  "pooler_fc_size": 768,
+  "pooler_num_attention_heads": 12,
+  "pooler_num_fc_layers": 3,
+  "pooler_size_per_head": 128,
+  "pooler_type": "first_token_transform",
+  "type_vocab_size": 2,
+  "vocab_size": 32000
+}

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d5168a7eaef0e4fca20692c5f0aee87d248ee91074724861e2fb7dda4a141f3
+size 440220471

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

tf_model.h5 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b9ad48a206924a20c79d0263bcd1dfaf292be83216d8c7423b146a10eadadc4
+size 442767148

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"do_lower_case": true, "init_inputs": []}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff