Edit model card

XLM-RoBERTa (base) Middle High German Charter Masked Language Model

This model is a fine-tuned version of xlm-roberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.

Model description

Please refer this model together with to the XLM-RoBERTa (base-sized model) card or the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. for additional information.

Intended uses & limitations

This model can be used for sequence prediction tasks, i.e., fill-masks.

Training and evaluation data

The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a Tesla V100-SXM2-16GB GPU.

Training hyperparameters

The following hyperparameters were used during training:

  • num_train_epochs: 15
  • learning_rate: 2e-5
  • weight-decay: 0,01
  • train_batch_size: 16
  • eval_batch_size: 16
  • num_proc: 4
  • block_size: 256

Training results

Epoch Training Loss Validation Loss
1 2.423800 2.025645
2 1.876500 1.700380
3 1.702100 1.565900
4 1.582400 1.461868
5 1.506000 1.393849
6 1.407300 1.359359
7 1.385400 1.317869
8 1.336700 1.285630
9 1.301300 1.246812
10 1.273500 1.219290
11 1.245600 1.198312
12 1.225800 1.198695
13 1.214100 1.194895
14 1.209500 1.177452
15 1.200300 1.177396

Perplexity: 3.25

Updates

  • 2023-03-30: Upload

Citation

Please cite the following papers when using this model.

@misc{xlm-roberta-base-mhg-charter-mlm,
  title={xlm-roberta-base-mhg-charter-mlm},
  author={Atzenhofer-Baumgartner, Florian},
  year         = { 2023 },
  url          = { https://huggingface.co/atzenhofer/xlm-roberta-base-mhg-charter-mlm },
  publisher    = { Hugging Face }
}
Downloads last month
2