|
--- |
|
tags: |
|
- text2text-generation |
|
- definition-modeling |
|
metrics: |
|
- rouge, bleu, bert-f1 |
|
model-index: |
|
- name: flan-t5-definition-en-xl |
|
results: [] |
|
language: |
|
- en |
|
widget: |
|
- text: "He ate a sweet apple. What is the definition of apple?" |
|
example_title: "Definition generation" |
|
- text: "The paper contains a number of original ideas about color perception. What is the definition of original?" |
|
example_title: "Definition generation" |
|
license: cc-by-sa-4.0 |
|
datasets: |
|
- marksverdhei/wordnet-definitions-en-2021 |
|
--- |
|
|
|
# FLAN-T5-Definition XL |
|
|
|
This model is a version of [FLAN-T5 XL](https://huggingface.co/google/flan-t5-xl) finetuned on a dataset of English definitions and usage examples. |
|
|
|
It generates definitions of English words in context. |
|
Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?" |
|
|
|
## Model description |
|
|
|
See details in the paper `Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis` (ACL'2023) by Mario Giulianelli, Iris Luden, Raquel Fernandez and Andrey Kutuzov. |
|
|
|
## Intended uses & limitations |
|
|
|
The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions. |
|
|
|
The fine-tuning datasets were limited to English. |
|
Although the original FLAN-T5 is a multilingual model, we did not thoroughly evaluate its ability to generate definitions in languages other than English. |
|
|
|
Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model. |
|
|
|
|
|
## Training and evaluation data |
|
|
|
Three datasets were used to fine-tune the model: |
|
- *WordNet* ([Ishiwatari et al., NAACL 2019](https://aclanthology.org/N19-1350/)), also [available on HF](https://huggingface.co/datasets/marksverdhei/wordnet-definitions-en-2021) |
|
- *Oxford dictionary or CHA* ([Gadetsky et al., ACL 2018](https://aclanthology.org/P18-2043/)) |
|
- English subset of *CodWoE* ([Mickus et al., SemEval 2022](https://aclanthology.org/2022.semeval-1.1/)) |
|
|
|
FLAN-T5-Definition XL achieves the following results on the WordNet test set: |
|
- ROUGE-L: 52.21 |
|
- BLEU: 32.81 |
|
- BERT-F1: 92.16 |
|
|
|
FLAN-T5-Definition XL achieves the following results on the Oxford dictionary test set: |
|
- ROUGE-L: 38.72 |
|
- BLEU: 18.69 |
|
- BERT-F1: 89.75 |
|
|
|
## Training procedure |
|
FLAN-T5 XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions. |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 4 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 8 |
|
- total_train_batch_size: 16 |
|
- total_eval_batch_size: 16 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 20.0 |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.23.1 |
|
- Pytorch 1.12.1+rocm5.1.1 |
|
- Datasets 2.4.0 |
|
- Tokenizers 0.12.1 |
|
|
|
## Citation |