Table of Contents
Model description
mbert-base-cased-NER-NL-legislation-refs is a fine-tuned BERT model that was trained to recognize the entity type 'legislation references' (REF) in Dutch case law.
Specifically, this model is a bert-base-multilingual-cased model that was fine-tuned on the mbert-base-cased-NER-NL-legislation-refs-data dataset.
Training procedure
Dataset
This model was fine-tuned on the mbert-base-cased-NER-NL-legislation-refs-data dataset. This dataset consists of 512 token long examples which each contain one or more legislation references. These examples were created from a weakly labelled corpus of Dutch case law which was scraped from Linked Data Overheid, pre-tokenized and labelled (biluo_tags_from_offsets) through spaCy and further tokenized through applying Hugging Face's AutoTokenizer.from_pretrained() for bert-base-multilingual-cased's tokenizer.
Results
Model | Precision | Recall | F1-score |
---|---|---|---|
mBERT | 0.891 | 0.919 | 0.905 |
Using Hugging Face's hosted inference API widget this model can be quickly tested on the provided examples. Note that the hosted inference API widget incorrectly presents the last token of a legislation reference as a seperate entity due to the workings of its 'simple' aggregation_strategy. While this model was fine-tuned on training data labelled in accordence with the BILOU scheme, the hosted inference API groups entities by merging B- and I- tags when the tag is similar (thereby omitting the L- tags).
Limitations and biases
More information needed
BibTeX entry and citation info
More information needed
- Downloads last month
- 8
Dataset used to train romjansen/mbert-base-cased-NER-NL-legislation-refs
Evaluation results
- precision on romjansen/mbert-base-cased-NER-NL-legislation-refs-dataself-reported0.891
- recall on romjansen/mbert-base-cased-NER-NL-legislation-refs-dataself-reported0.919
- F1-score on romjansen/mbert-base-cased-NER-NL-legislation-refs-dataself-reported0.905