|
--- |
|
license: apache-2.0 |
|
language: |
|
- multilingual |
|
- af |
|
- am |
|
- ar |
|
- az |
|
- be |
|
- bg |
|
- bn |
|
- ca |
|
- ceb |
|
- co |
|
- cs |
|
- cy |
|
- da |
|
- de |
|
- el |
|
- en |
|
- eo |
|
- es |
|
- et |
|
- eu |
|
- fa |
|
- fi |
|
- fil |
|
- fr |
|
- fy |
|
- ga |
|
- gd |
|
- gl |
|
- gu |
|
- ha |
|
- haw |
|
- hi |
|
- hmn |
|
- ht |
|
- hu |
|
- hy |
|
- ig |
|
- is |
|
- it |
|
- iw |
|
- ja |
|
- jv |
|
- ka |
|
- kk |
|
- km |
|
- kn |
|
- ko |
|
- ku |
|
- ky |
|
- la |
|
- lb |
|
- lo |
|
- lt |
|
- lv |
|
- mg |
|
- mi |
|
- mk |
|
- ml |
|
- mn |
|
- mr |
|
- ms |
|
- mt |
|
- my |
|
- ne |
|
- nl |
|
- no |
|
- ny |
|
- pa |
|
- pl |
|
- ps |
|
- pt |
|
- ro |
|
- ru |
|
- sd |
|
- si |
|
- sk |
|
- sl |
|
- sm |
|
- sn |
|
- so |
|
- sq |
|
- sr |
|
- st |
|
- su |
|
- sv |
|
- sw |
|
- ta |
|
- te |
|
- tg |
|
- th |
|
- tr |
|
- uk |
|
- und |
|
- ur |
|
- uz |
|
- vi |
|
- xh |
|
- yi |
|
- yo |
|
- zh |
|
- zu |
|
datasets: |
|
- mc4 |
|
--- |
|
|
|
# MLongT5 (transient-global attention, xl-sized model) |
|
|
|
MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper [mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences](https://arxiv.org/pdf/2305.11129.pdf) by Uthus et al. and first released in [the LongT5 repository](https://github.com/google-research/longt5). All the model architecture and configuration can be found in [Flaxformer repository](https://github.com/google/flaxformer) which uses another Google research project repository [T5x](https://github.com/google-research/t5x). |
|
|
|
Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar. |
|
|
|
## Model description |
|
MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting ([Pegasus-like generation pre-training](https://arxiv.org/pdf/1912.08777.pdf)). MLongT5 model is an extension of [LongT5 model](https://arxiv.org/abs/2112.07916), and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence. |
|
|
|
MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens). |
|
|
|
## Intended uses & limitations |
|
|
|
The model is mostly meant to be fine-tuned on a supervised dataset. See the [model hub](https://huggingface.co/models?search=mlongt5) to look for fine-tuned versions on a task that interests you. |
|
|
|
### How to use |
|
|
|
```python |
|
from transformers import T5Tokenizer, LongT5Model |
|
|
|
tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-xl") |
|
model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-xl") |
|
|
|
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") |
|
outputs = model(**inputs) |
|
|
|
last_hidden_states = outputs.last_hidden_state |
|
``` |
|
|
|
### BibTeX entry and citation info |
|
|
|
```bibtex |
|
@misc{uthus2023mlongt5, |
|
title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences}, |
|
author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo}, |
|
year={2023}, |
|
eprint={2305.11129}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
> Created by [Ahmed Elnaggar/@Elnaggar_AI](https://twitter.com/Elnaggar_AI) | [LinkedIn](https://www.linkedin.com/in/prof-ahmed-elnaggar/) |