Fill-Mask
Transformers
PyTorch
Danish
bert
legal
Inference Endpoints
Edit model card

Danish LegalBERT (derivative of Maltehb/danish-bert-botxo)

This model is a derivative of Maltehb/danish-bert-botxo adapted to legal text. It has been pre-trained on a combination of the Danish part of the MultiEURLEX (Chalkidis et al., 2021) dataset comprising EU legislation and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus (Derczynski et al., 2021) comprising legal proceedings. It achieves the following results on the evaluation set:

  • Loss: -

Model description

This is a BERT model (Devlin et al., 2018) model pre-trained on Danish legal corpora. It follows a base configuration with 12 Transformer layers, each one with 768 hidden units and 12 attention heads.

Intended uses & limitations

More information needed

Training and evaluation data

This model is pre-training on a combination of the Danish part of the MultiEURLEX dataset and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus.

Training procedure

The model was initially pre-trained for 500k steps with sequences up to 128 tokens, and then continued pre-training for additional 100k with sequences up to 512 tokens.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: tpu
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 256
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • training_steps: 100000

Training results

Training Loss Length Step Validation Loss
1.0030 128 50000 -
0.9593 128 100000 -
Downloads last month
4

Dataset used to train coastalcph/danish-legal-bert-base