Edit model card

DistilRoBERTa (base) Middle High German Charter Masked Language Model

This model is a fine-tuned version of distilroberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.

Model description

Please refer this model together with to the distilroberta (base-sized model) card or the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Sanh et al. for additional information.

Intended uses & limitations

This model can be used for sequence prediction tasks, i.e., fill-masks.

Training and evaluation data

The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a NVIDIA GeForce GTX 1660 Ti 6GB GPU.

Training hyperparameters

The following hyperparameters were used during training:

  • num_train_epochs: 10
  • learning_rate: 2e-5
  • weight-decay: 0,01
  • train_batch_size: 8
  • eval_batch_size: 8
  • num_proc: 4
  • block_size: 256

Training results

Epoch Training Loss Validation Loss
1 2.537000 2.112094
2 2.053400 1.838937
3 1.900300 1.706654
4 1.766200 1.607970
5 1.669200 1.532340
6 1.619100 1.490333
7 1.571300 1.476035
8 1.543100 1.428958
9 1.517100 1.423216
10 1.508300 1.408235

Perplexity: 4.07

Updates

  • 2023-03-30: Upload

Citation

Please cite as follows when using this model.

@misc{distilroberta-base-mhg-charter-mlm,
  title={distilroberta-base-mhg-charter-mlm},
  author={Atzenhofer-Baumgartner, Florian},
  year         = { 2023 },
  url          = { https://huggingface.co/atzenhofer/distilroberta-base-mhg-charter-mlm },
  publisher    = { Hugging Face }
}
Downloads last month
15