--- license: cc-by-nc-4.0 pipeline_tag: fill-mask tags: - legal language: - da datasets: - multi_eurlex - DDSC/partial-danish-gigaword-no-twitter model-index: - name: coastalcph/danish-legal-lm-base results: [] --- # Danish Legal LM This model is pre-training on a combination of the Danish part of the MultiEURLEX (Chalkidis et al., 2021) dataset comprising EU legislation and two subsets (`retsinformationdk`, `retspraksis`) of the Danish Gigaword Corpus (Derczynski et al., 2021) comprising legal proceedings. It achieves the following results on the evaluation set: - Loss: 0.7302 (up to 128 tokens) - Loss: 0.7847 (up to 512 tokens) ## Model description This is a RoBERTa (Liu et al., 2019) model pre-trained on Danish legal corpora. It follows a base configuration with 12 Transformer layers, each one with 768 hidden units and 12 attention heads. ## Intended uses & limitations More information needed ## Training and evaluation data This model is pre-training on a combination of the Danish part of the MultiEURLEX dataset and two subsets (`retsinformationdk`, `retspraksis`) of the Danish Gigaword Corpus. ## Training procedure The model was initially pre-trained for 500k steps with sequences up to 128 tokens, and then continued pre-training for additional 100k with sequences up to 512 tokens. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - distributed_type: tpu - num_devices: 8 - gradient_accumulation_steps: 2 - total_train_batch_size: 256 - total_eval_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.05 - training_steps: 500000 + 100000 ### Training results | Training Loss | Length | Step | Validation Loss | |:-------------:|:------:|:-------:|:---------------:| | 1.4648 | 128 | 50000 | 1.2920 | | 1.2165 | 128 | 100000 | 1.0625 | | 1.0952 | 128 | 150000 | 0.9611 | | 1.0233 | 128 | 200000 | 0.8931 | | 0.963 | 128 | 250000 | 0.8477 | | 0.9122 | 128 | 300000 | 0.8168 | | 0.8697 | 128 | 350000 | 0.7836 | | 0.8397 | 128 | 400000 | 0.7560 | | 0.8231 | 128 | 450000 | 0.7476 | | 0.8207 | 128 | 500000 | 0.7243 | | Training Loss | Length | Step | Validation Loss | |:-------------:|:------:|:-------:|:---------------:| | 0.7045 | 512 | +50000 | 0.8318 | | 0.6432 | 512 | +100000 | 0.7913 | ### Framework versions - Transformers 4.18.0 - Pytorch 1.12.0+cu102 - Datasets 2.0.0 - Tokenizers 0.12.0