mlong-t5-tglobal-xl / README.md
agemagician's picture
Update README.md
7ac377f
|
raw
history blame
3.12 kB
metadata
license: apache-2.0
language:
  - multilingual
  - af
  - am
  - ar
  - az
  - be
  - bg
  - bn
  - ca
  - ceb
  - co
  - cs
  - cy
  - da
  - de
  - el
  - en
  - eo
  - es
  - et
  - eu
  - fa
  - fi
  - fil
  - fr
  - fy
  - ga
  - gd
  - gl
  - gu
  - ha
  - haw
  - hi
  - hmn
  - ht
  - hu
  - hy
  - ig
  - is
  - it
  - iw
  - ja
  - jv
  - ka
  - kk
  - km
  - kn
  - ko
  - ku
  - ky
  - la
  - lb
  - lo
  - lt
  - lv
  - mg
  - mi
  - mk
  - ml
  - mn
  - mr
  - ms
  - mt
  - my
  - ne
  - nl
  - 'no'
  - ny
  - pa
  - pl
  - ps
  - pt
  - ro
  - ru
  - sd
  - si
  - sk
  - sl
  - sm
  - sn
  - so
  - sq
  - sr
  - st
  - su
  - sv
  - sw
  - ta
  - te
  - tg
  - th
  - tr
  - uk
  - und
  - ur
  - uz
  - vi
  - xh
  - yi
  - yo
  - zh
  - zu
datasets:
  - mc4

MLongT5 (transient-global attention, xl-sized model)

MLongT5 model pre-trained on Multi-language corpus. The model was introduced in the paper mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences by Uthus et al. and first released in the LongT5 repository. All the model architecture and configuration can be found in Flaxformer repository which uses another Google research project repository T5x.

Disclaimer: The team releasing MLongT5 did not write a model card for this model so this model card has been written by Ahmed Elnaggar.

Model description

MLongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting (Pegasus-like generation pre-training). MLongT5 model is an extension of LongT5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.

MLongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens).

Intended uses & limitations

The model is mostly meant to be fine-tuned on a supervised dataset. See the model hub to look for fine-tuned versions on a task that interests you.

How to use

from transformers import T5Tokenizer, LongT5Model

tokenizer = T5Tokenizer.from_pretrained("agemagician/mlong-t5-tglobal-xl")
model = LongT5Model.from_pretrained("agemagician/mlong-t5-tglobal-xl")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

BibTeX entry and citation info

@misc{uthus2023mlongt5,
      title={mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences}, 
      author={David Uthus and Santiago Ontañón and Joshua Ainslie and Mandy Guo},
      year={2023},
      eprint={2305.11129},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Created by Ahmed Elnaggar/@Elnaggar_AI | LinkedIn