Danish LegalBERT (derivative of Maltehb/danish-bert-botxo)

This model is a derivative of Maltehb/danish-bert-botxo adapted to legal text. It has been pre-trained on a combination of the Danish part of the MultiEURLEX (Chalkidis et al., 2021) dataset comprising EU legislation and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus (Derczynski et al., 2021) comprising legal proceedings. It achieves the following results on the evaluation set:

Loss: -

Model description

This is a BERT model (Devlin et al., 2018) model pre-trained on Danish legal corpora. It follows a base configuration with 12 Transformer layers, each one with 768 hidden units and 12 attention heads.

Intended uses & limitations

More information needed

Training and evaluation data

This model is pre-training on a combination of the Danish part of the MultiEURLEX dataset and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus.

Training procedure

The model was initially pre-trained for 500k steps with sequences up to 128 tokens, and then continued pre-training for additional 100k with sequences up to 512 tokens.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00001
train_batch_size: 16
eval_batch_size: 16
seed: 42
distributed_type: tpu
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 256
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
training_steps: 100000

Training results

Training Loss	Length	Step	Validation Loss
1.0030	128	50000	-
0.9593	128	100000	-

coastalcph
/

danish-legal-bert-base