Update README.md
Browse files
README.md
CHANGED
@@ -15,10 +15,22 @@ thumbnail: >-
|
|
15 |
# Tucano: Advancing Neural Text Generation for Portuguese
|
16 |
|
17 |
</div>
|
|
|
18 |
<p align="center">
|
19 |
<img src="./logo.png" alt="An illustration of a Tucano bird showing vibrant colors like yellow, orange, blue, green, and black." height="400">
|
20 |
</p>
|
21 |
|
22 |
To stimulate the future of open development of neural text generation in Portuguese, we present both **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and **[Tucano](https://huggingface.co/TucanoBR/Tucano-2b4)**, a series of decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation, are openly released on [GitHub](https://github.com/Nkluge-correa/Tucano) and Hugging Face.
|
23 |
|
24 |
-
Read our preprint in [arXiv](https://arxiv.org/abs/2411.07854).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
# Tucano: Advancing Neural Text Generation for Portuguese
|
16 |
|
17 |
</div>
|
18 |
+
|
19 |
<p align="center">
|
20 |
<img src="./logo.png" alt="An illustration of a Tucano bird showing vibrant colors like yellow, orange, blue, green, and black." height="400">
|
21 |
</p>
|
22 |
|
23 |
To stimulate the future of open development of neural text generation in Portuguese, we present both **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and **[Tucano](https://huggingface.co/TucanoBR/Tucano-2b4)**, a series of decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation, are openly released on [GitHub](https://github.com/Nkluge-correa/Tucano) and Hugging Face.
|
24 |
|
25 |
+
Read our preprint in [arXiv](https://arxiv.org/abs/2411.07854).
|
26 |
+
|
27 |
+
## News
|
28 |
+
|
29 |
+
- [29/11/2024] Tucano is mentioned on Deutsche Welle: "[Cientistas criam maior banco de dados em português para IA](https://www.dw.com/pt-br/pesquisadores-da-alemanha-criam-maior-banco-de-dados-p%C3%BAblico-em-portugu%C3%AAs-para-ia/a-70917082)".
|
30 |
+
- [12/11/2024] "[Tucano: Advancing Neural Text Generation for Portuguese](https://arxiv.org/abs/2411.07854)" is published as a preprint on ArXiv, with all models and datasets released on [Hugging Face](https://huggingface.co/TucanoBR).
|
31 |
+
|
32 |
+
## Community Contributions 🤝
|
33 |
+
|
34 |
+
- Demo on how to [run inference on Tucano](https://colab.research.google..com/drive/1Qf2DsFOFDA7RKkamI-tH3OregtOlZ8Cz).
|
35 |
+
- Demo on how to create a simple [Chat UI for Tucano](https://colab.research.google.com/drive/1fEW10CXksMfMv1veLr22OESwDs6e-W1b) using Gradio.
|
36 |
+
- [Tucano OpenVINO](https://huggingface.co/cabelo/Tucano-2b4-Instruct-fp16-ov) is a ported version of Tucano-2b4-Instruct optimized for Intel openVINO inference technology.
|