RobBERT-2023-dutch-base-abb: Model finetuned on Flemish Local Decisions for ABB

RobBERT-2023-Dutch-Base-ABB is a fine-tuned version of DTAI-KULeuven/robbert-2023-dutch-base, trained specifically on the data of Lokaal Beslist, which contains over 2 million agenda points. This fine-tuned model enhances performance on tasks related to Flemish administrative decisions, providing better contextual understanding and more accurate predictions in this domain.

The aim of RobBERT-2023-Dutch-Base-ABB is to create a robust NLP tool for Flemish administrative texts. Fine-tuning on this extensive dataset improves its capabilities in classification, named entity recognition (NER), and other language processing tasks relevant to administrative and governmental contexts. It serves as a valuable resource for researchers and data analysts, and as a foundation for further specialized models to efficiently handle and analyze administrative data.

How to use

RobBERT-2023 and RobBERT both use the RoBERTa architecture and pre-training but with a Dutch tokenizer and training data. RoBERTa is the robustly optimized English BERT model, making it even more powerful than the original BERT model. Given this same architecture, RobBERT can easily be finetuned and inferenced using code to finetune RoBERTa models and most code used for BERT models, e.g. as provided by HuggingFace Transformers library.

By default, RobBERT-2023-dutch-base-abb has the masked language model head used in training. This can be used as a zero-shot way to fill masks in sentences.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("svercoutere/robbert-2023-dutch-base-abb")
model = AutoModelForSequenceClassification.from_pretrained("svercoutere/robbert-2023-dutch-base-abb")