AI & ML interests

Advancing Neural Text Generation for Portuguese

Recent Activity

Tucano: Advancing Neural Text Generation for Portuguese

An illustration of a Tucano bird showing vibrant colors like yellow, orange, blue, green, and black.

To stimulate the future of open development of neural text generation in Portuguese, we present both GigaVerbo, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and Tucano, a series of decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation, are openly released on GitHub and Hugging Face.

Read our preprint in arXiv.


Community Contributions 🤝