Edit model card

mT0-Definition-Ru XL

This model is a version of mT0 XL finetuned on the Russian part of CodWoE, a dataset of definitions and usage examples.

It generates definitions of Russian words in context. Its input is the usage example and the instruction question "Что такое TARGET_WORD?"

Models for other languages:

Model description

See details in the paper Enriching Word Usage Graphs with Cluster Definitions (LREC-COLING'2024) by Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.

Intended uses & limitations

The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions. Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.

Training and evaluation data

Russian subset of CodWoE (Mickus et al., SemEval 2022).

Training results

mT0-Definition-Ru XL achieves the following results on the CodWoE evaluation set:

  • Loss: 1.7996
  • Rouge1: 17.5576
  • Rouge2: 8.7614
  • Rougel: 17.2533
  • Rougelsum: 17.3204
  • Gen Len: 21.6774

Training procedure

mT0-Definition-Ru XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20.0

Framework versions

  • Transformers 4.37.1
  • Pytorch 1.13.1+rocm5.2
  • Datasets 2.16.1
  • Tokenizers 0.15.1

Citation

@inproceedings{kutuzov-etal-2024-enriching-word,
    title = "Enriching Word Usage Graphs with Cluster Definitions",
    author = "Kutuzov, Andrey  and
      Fedorova, Mariia  and
      Schlechtweg, Dominik  and
      Arefyev, Nikolay",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.546",
    pages = "6189--6198",
    abstract = "We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The conducted human evaluation has shown that these definitions match the existing clusters in WUGs better than the definitions chosen from WordNet by two baseline systems. At the same time, the method is straightforward to use and easy to extend to new languages. The resulting enriched datasets can be extremely helpful for moving on to explainable semantic change modeling.",
}
Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including ltg/mt0-definition-ru-xl