File size: 6,844 Bytes
eb5309e
 
e7a41c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb5309e
e7a41c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1c907ad
 
e7a41c6
 
 
 
 
 
 
 
 
 
 
 
1c907ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e7a41c6
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
---
license: apache-2.0
language: 
- multilingual
- af
- am
- ar
- az
- be
- bg
- bn
- ca
- ceb
- co
- cs
- cy
- da
- de
- el
- en
- eo
- es
- et
- eu
- fa
- fi
- fil
- fr
- fy
- ga
- gd
- gl
- gu
- ha
- haw
- hi
- hmn
- ht
- hu
- hy
- ig
- is
- it
- iw
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ku
- ky
- la
- lb
- lo
- lt
- lv
- mg
- mi
- mk
- ml
- mn
- mr
- ms
- mt
- my
- ne
- nl
- no
- ny
- pa
- pl
- ps
- pt
- ro
- ru
- sd
- si
- sk
- sl
- sm
- sn
- so
- sq
- sr
- st
- su
- sv
- sw
- ta
- te
- tg
- th
- tr
- uk
- und
- ur
- uz
- vi
- xh
- yi
- yo
- zh
- zu
datasets:
- mc4
---

# MLongT5 (transient-global attention, large-sized model)

MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper [mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences](https://arxiv.org/pdf/2305.11129.pdf) by Uthus et al. and first released in [the LongT5 repository](https://github.com/google-research/longt5). All the model architecture and configuration can be found in [Flaxformer repository](https://github.com/google/flaxformer) which uses another Google research project repository [T5x](https://github.com/google-research/t5x).

Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar.

## Model description
MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting ([Pegasus-like generation pre-training](https://arxiv.org/pdf/1912.08777.pdf)). MLongT5 model is an extension of [LongT5 model](https://arxiv.org/abs/2112.07916), and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.

MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens).

## Intended uses & limitations

The model is mostly meant to be fine-tuned on a supervised dataset. See the [model hub](https://huggingface.co/models?search=mlongt5) to look for fine-tuned versions on a task that interests you.

### How to use

The following shows how one can extract the last hidden representation for the model.

```python
from transformers import T5Tokenizer, LongT5Model

tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-large")
model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-large")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state
```

The following shows how one can predict masked passages using the different denoising strategies.

### S-Denoising

For *S-Denoising*, please make sure to prompt the text with the prefix `[S2S]` as shown below.

```python
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch

model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-large", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")                                                                                                   
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-large")

input_string = "[S2S] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man with a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere <extra_id_0>"                                               

inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(inputs, max_length=200)

print(tokenizer.decode(outputs[0]))
```

### R-Denoising

For *R-Denoising*, please make sure to prompt the text with the prefix `[NLU]` as shown below.

```python
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch

model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-large", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")                                                                                                   
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-large")

input_string = "[NLU] Mr. Dursley was the director of a firm called <extra_id_0>, which made <extra_id_1>. He was a big, solid man with a bald head. Mrs. Dursley was thin and <extra_id_2> of neck, which came in very useful as she spent so much of her time <extra_id_3>. The Dursleys had a small son called Dudley and <extra_id_4>"                                               

inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")

outputs = model.generate(inputs, max_length=200)

print(tokenizer.decode(outputs[0]))
```

### X-Denoising

For *X-Denoising*, please make sure to prompt the text with the prefix `[NLG]` as shown below.

```python
from transformers import LongT5ForConditionalGeneration, T5Tokenizer
import torch

model = LongT5ForConditionalGeneration.from_pretrained("agemagician/mlong-t5-tglobal-large", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to("cuda")                                                                                                   
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-large")

input_string = "[NLG] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man wiht a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she
spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere. <extra_id_0>"                                               

model.cuda()
inputs = tokenizer(input_string, return_tensors="pt", add_special_tokens=False).input_ids.to("cuda")

outputs = model.generate(inputs, max_length=200)

print(tokenizer.decode(outputs[0]))
```

### BibTeX entry and citation info

```bibtex
@misc{uthus2023mlongt5,
      title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences}, 
      author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
      year={2023},
      eprint={2305.11129},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

> Created by [Ahmed Elnaggar/@Elnaggar_AI](https://twitter.com/Elnaggar_AI) | [LinkedIn](https://www.linkedin.com/in/prof-ahmed-elnaggar/)