File size: 3,115 Bytes

---
license: apache-2.0
language: 
- multilingual
- af
- am
- ar
- az
- be
- bg
- bn
- ca
- ceb
- co
- cs
- cy
- da
- de
- el
- en
- eo
- es
- et
- eu
- fa
- fi
- fil
- fr
- fy
- ga
- gd
- gl
- gu
- ha
- haw
- hi
- hmn
- ht
- hu
- hy
- ig
- is
- it
- iw
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ku
- ky
- la
- lb
- lo
- lt
- lv
- mg
- mi
- mk
- ml
- mn
- mr
- ms
- mt
- my
- ne
- nl
- no
- ny
- pa
- pl
- ps
- pt
- ro
- ru
- sd
- si
- sk
- sl
- sm
- sn
- so
- sq
- sr
- st
- su
- sv
- sw
- ta
- te
- tg
- th
- tr
- uk
- und
- ur
- uz
- vi
- xh
- yi
- yo
- zh
- zu
datasets:
- mc4
---

# MLongT5 (transient-global attention, xl-sized model)

MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper [mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences](https://arxiv.org/pdf/2305.11129.pdf) by Uthus et al. and first released in [the LongT5 repository](https://github.com/google-research/longt5). All the model architecture and configuration can be found in [Flaxformer repository](https://github.com/google/flaxformer) which uses another Google research project repository [T5x](https://github.com/google-research/t5x).

Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar.

## Model description
MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting ([Pegasus-like generation pre-training](https://arxiv.org/pdf/1912.08777.pdf)). MLongT5 model is an extension of [LongT5 model](https://arxiv.org/abs/2112.07916), and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.

MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens).

## Intended uses & limitations

The model is mostly meant to be fine-tuned on a supervised dataset. See the [model hub](https://huggingface.co/models?search=mlongt5) to look for fine-tuned versions on a task that interests you.

### How to use

```python
from transformers import T5Tokenizer, LongT5Model

tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-xl")
model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-xl")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state
```

### BibTeX entry and citation info

```bibtex
@misc{uthus2023mlongt5,
      title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences}, 
      author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
      year={2023},
      eprint={2305.11129},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

> Created by [Ahmed Elnaggar/@Elnaggar_AI](https://twitter.com/Elnaggar_AI) | [LinkedIn](https://www.linkedin.com/in/prof-ahmed-elnaggar/)