ltg
/

mt0-definition-en-xl

Text2Text Generation

Transformers

PyTorch

English

mt5

definition-modeling

Model card Files Files and versions Community

Andrey Kutuzov commited on Mar 25, 2024

Commit

92b4692

1 Parent(s): ff0ec79

Camera ready

Browse files

Files changed (2) hide show

README.md +26 -38
config.json +1 -1

README.md CHANGED Viewed

@@ -19,32 +19,44 @@ datasets:
 - marksverdhei/wordnet-definitions-en-2021
 ---
-# mt0-definition-en-xl
-This model is a version of [mt0-xl](https://huggingface.co/bigscience/mt0-xl) fine-tuned on English WordNet, CodWoE and Oxford.
-It achieves the following results on the evaluation set:
-- Loss: 1.7210
-- Rouge1: 41.5067
-- Rouge2: 23.7149
-- Rougel: 39.138
-- Rougelsum: 39.1647
-- Gen Len: 15.1578
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -61,35 +73,11 @@ The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 20.0
-### Training results
-| Training Loss | Epoch | Step  | Validation Loss | Rouge1  | Rouge2  | Rougel  | Rougelsum | Gen Len |
-|:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
-| 2.1171        | 1.0   | 1370  | 1.8175          | 27.0261 | 8.6429  | 25.2826 | 25.2952   | 11.8798 |
-| 1.8186        | 2.0   | 2740  | 1.7112          | 29.1583 | 9.9747  | 27.3432 | 27.3647   | 11.7919 |
-| 1.643         | 3.0   | 4110  | 1.6442          | 30.9045 | 11.2256 | 28.7826 | 28.788    | 12.4125 |
-| 1.499         | 4.0   | 5480  | 1.5978          | 32.1126 | 12.6674 | 29.97   | 29.9843   | 12.3129 |
-| 1.3772        | 5.0   | 6850  | 1.5720          | 33.6113 | 13.8451 | 31.3468 | 31.3599   | 12.6887 |
-| 1.2742        | 6.0   | 8220  | 1.5564          | 34.4899 | 15.1005 | 32.3177 | 32.3291   | 12.2003 |
-| 1.1785        | 7.0   | 9590  | 1.5466          | 35.4729 | 16.2035 | 33.2166 | 33.2295   | 12.4487 |
-| 1.0941        | 8.0   | 10960 | 1.5571          | 36.4885 | 17.5396 | 34.2494 | 34.2759   | 12.7543 |
-| 1.0202        | 9.0   | 12330 | 1.5541          | 37.4019 | 18.5568 | 35.1341 | 35.1473   | 12.8603 |
-| 0.9552        | 10.0  | 13700 | 1.5642          | 38.127  | 19.4057 | 35.9008 | 35.9163   | 12.6987 |
-| 0.8963        | 11.0  | 15070 | 1.5772          | 38.5073 | 20.0584 | 36.3304 | 36.3399   | 12.7052 |
-| 0.8443        | 12.0  | 16440 | 1.5955          | 39.2323 | 20.9237 | 36.9863 | 37.0049   | 13.0395 |
-| 0.7982        | 13.0  | 17810 | 1.6089          | 39.7947 | 21.6422 | 37.5619 | 37.5815   | 13.1400 |
-| 0.7586        | 14.0  | 19180 | 1.6293          | 40.2922 | 22.2301 | 38.0755 | 38.0757   | 12.8589 |
-| 0.7234        | 15.0  | 20550 | 1.6493          | 40.6358 | 22.5355 | 38.3523 | 38.3659   | 13.1102 |
-| 0.6946        | 16.0  | 21920 | 1.6701          | 40.7708 | 22.906  | 38.5037 | 38.5174   | 13.1035 |
-| 0.6688        | 17.0  | 23290 | 1.6902          | 41.0847 | 23.1663 | 38.8126 | 38.8149   | 13.2951 |
-| 0.6484        | 18.0  | 24660 | 1.7005          | 41.2075 | 23.3967 | 38.9529 | 38.9545   | 13.2707 |
-| 0.6342        | 19.0  | 26030 | 1.7116          | 41.2454 | 23.5187 | 39.0203 | 39.0396   | 13.2173 |
-| 0.6234        | 20.0  | 27400 | 1.7210          | 41.3073 | 23.5691 | 39.0662 | 39.074    | 13.2558 |
 ### Framework versions
 - Transformers 4.30.2
 - Pytorch 1.13.1+rocm5.2
 - Datasets 2.12.0
 - Tokenizers 0.12.1

 - marksverdhei/wordnet-definitions-en-2021
 ---
+# mT0-Definition-En XL
+This model is a version of [mT0 XL](https://huggingface.co/bigscience/mt0-xl) finetuned on a dataset of English definitions and usage examples.
+It generates definitions of English words in context.
+Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"
 ## Model description
+See details in the paper `Enriching Word Usage Graphs with Cluster Definitions` (LREC-COLING'2024) by
+Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.
 ## Intended uses & limitations
+The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.
+Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.
 ## Training and evaluation data
+Three datasets were used to fine-tune the model:
+- *WordNet* ([Ishiwatari et al., NAACL 2019](https://aclanthology.org/N19-1350/)), also [available on HF](https://huggingface.co/datasets/marksverdhei/wordnet-definitions-en-2021)
+- *Oxford dictionary or CHA* ([Gadetsky et al., ACL 2018](https://aclanthology.org/P18-2043/))
+- English subset of *CodWoE* ([Mickus et al., SemEval 2022](https://aclanthology.org/2022.semeval-1.1/))
+## Training results
+mT0-Definition-En XL achieves the following results on concatenated validations sets from WordNet and Oxford dictionary:
+- Loss: 1.7210
+- Rouge1: 41.5067
+- Rouge2: 23.7149
+- Rougel: 39.138
+- Rougelsum: 39.1647
+- Gen Len: 15.1578
 ## Training procedure
+mT0-Definition-En XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - lr_scheduler_type: linear
 - num_epochs: 20.0
 ### Framework versions
 - Transformers 4.30.2
 - Pytorch 1.13.1+rocm5.2
 - Datasets 2.12.0
 - Tokenizers 0.12.1
+## Citation

config.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "_name_or_path": "mt0-xl/",
   "architectures": [
     "MT5ForConditionalGeneration"
   ],

 {
+  "_name_or_path": "mt0-xl",
   "architectures": [
     "MT5ForConditionalGeneration"
   ],