File size: 7,133 Bytes
2001a9a
e0e38cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2001a9a
e0e38cf
 
2001a9a
e0e38cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e624e4
 
e0e38cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41d56a6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
tags:
- text2text-generation
- definition-modeling
metrics:
- rouge, bleu, bert-f1
model-index:
- name: flan-t5-definition-en-large
  results: []
language:
- en
widget:
- text: "He ate a sweet apple. What is the definition of apple?"
  example_title: "Definition generation"
- text: "The paper contains a number of original ideas about color perception. What is the definition of original?"
  example_title: "Definition generation"
license: cc-by-sa-4.0
datasets:
- marksverdhei/wordnet-definitions-en-2021
---


# FLAN-T5-Definition Large

This model is a version of [FLAN-T5 Large](https://huggingface.co/google/flan-t5-large) finetuned on a dataset of English definitions and usage examples.

It generates definitions of English words in context.
Its input is the usage example and the instruction question "What is the definiton of TARGET_WORD?"

This project is a collaboration between the [Dialogue Modelling Group](https://dmg-illc.github.io/dmg/) at the University of Amsterdam 
and the [Language Technology Group](https://www.mn.uio.no/ifi/english/research/groups/ltg/) at the University of Oslo.

## Sizes:
- [FLAN-T5-Definition Base (250M parameters)](https://huggingface.co/ltg/flan-t5-definition-en-base)
- [FLAN-T5-Definition Large (780M parameters)](https://huggingface.co/ltg/flan-t5-definition-en-large)
- [FLAN-T5-Definition XL (3B parameters)](https://huggingface.co/ltg/flan-t5-definition-en-xl)

## Model description

See details in the paper [`Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis`](https://aclanthology.org/2023.acl-long.176/) 
(ACL'2023) by Mario Giulianelli, Iris Luden, Raquel Fernandez and Andrey Kutuzov.

## Intended uses & limitations

The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.

The fine-tuning datasets were limited to English.
Although the original FLAN-T5 is a multilingual model, we did not thoroughly evaluate its ability to generate definitions in languages other than English. 

Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.

## Training and evaluation data

Three datasets were used to fine-tune the model:
- *WordNet* ([Ishiwatari et al., NAACL 2019](https://aclanthology.org/N19-1350/)), also [available on HF](https://huggingface.co/datasets/marksverdhei/wordnet-definitions-en-2021)
- *Oxford dictionary or CHA* ([Gadetsky et al., ACL 2018](https://aclanthology.org/P18-2043/))
- English subset of *CodWoE* ([Mickus et al., SemEval 2022](https://aclanthology.org/2022.semeval-1.1/))

FLAN-T5-Definition Large achieves the following results on the WordNet test set:
- BLEU: 14.37
- ROUGE-L: 33.74
- BERT-F1: 88.21

FLAN-T5-Definition Large achieves the following results on the Oxford dictionary test set:
- BLEU: 10.90
- ROUGE-L: 30.05
- BERT-F1: 87.44

## Training procedure

FLAN-T5 Base was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 15.0

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Rouge1  | Rouge2  | Rougel  | Rougelsum | Gen Len |
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
| 2.1769        | 1.0   | 2740  | 1.9050          | 28.7222 | 9.1873  | 26.6888 | 26.6937   | 11.3429 |
| 1.9408        | 2.0   | 5480  | 1.8151          | 29.8799 | 10.2327 | 27.7947 | 27.8044   | 11.4165 |
| 1.8124        | 3.0   | 8220  | 1.7608          | 30.9845 | 10.9982 | 28.8059 | 28.8131   | 11.5310 |
| 1.7118        | 4.0   | 10960 | 1.7229          | 31.6943 | 11.7412 | 29.4967 | 29.5319   | 11.7037 |
| 1.6286        | 5.0   | 13700 | 1.6937          | 32.5839 | 12.2431 | 30.1799 | 30.206    | 11.7784 |
| 1.5597        | 6.0   | 16440 | 1.6748          | 32.9915 | 12.8514 | 30.7016 | 30.7145   | 11.5974 |
| 1.4982        | 7.0   | 19180 | 1.6578          | 33.2157 | 13.1389 | 30.9428 | 30.9519   | 11.3580 |
| 1.4468        | 8.0   | 21920 | 1.6473          | 33.6146 | 13.5922 | 31.3001 | 31.3235   | 11.5724 |
| 1.4022        | 9.0   | 24660 | 1.6384          | 34.1711 | 14.1117 | 31.7951 | 31.8066   | 11.7389 |
| 1.364         | 10.0  | 27400 | 1.6337          | 34.5489 | 14.5012 | 32.1329 | 32.1446   | 11.6659 |
| 1.3321        | 11.0  | 30140 | 1.6291          | 34.7133 | 14.7297 | 32.3042 | 32.314    | 11.8003 |
| 1.3054        | 12.0  | 32880 | 1.6267          | 34.9411 | 15.0282 | 32.5335 | 32.5451   | 11.7619 |
| 1.2845        | 13.0  | 35620 | 1.6262          | 35.1648 | 15.2154 | 32.7387 | 32.742    | 11.8317 |
| 1.2699        | 14.0  | 38360 | 1.6257          | 35.2849 | 15.3109 | 32.8508 | 32.853    | 11.8168 |
| 1.2595        | 15.0  | 41100 | 1.6273          | 35.2224 | 15.2781 | 32.7718 | 32.7826   | 11.7971 |


### Framework versions

- Transformers 4.23.1
- Pytorch 1.12.1+rocm5.1.1
- Datasets 2.4.0
- Tokenizers 0.12.1

## Citation

```
@inproceedings{giulianelli-etal-2023-interpretable,
    title = "Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis",
    author = "Giulianelli, Mario  and
      Luden, Iris  and
      Fernandez, Raquel  and
      Kutuzov, Andrey",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.176",
    doi = "10.18653/v1/2023.acl-long.176",
    pages = "3130--3148",
    abstract = "We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations.Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users {---} historical linguists, lexicographers, or social scientists {---} to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the {`}definitions as representations{'} paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP.",
}
```