nicholasKluge commited on
Commit
ed58497
·
verified ·
1 Parent(s): be8a5d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -15,10 +15,22 @@ thumbnail: >-
15
  # Tucano: Advancing Neural Text Generation for Portuguese
16
 
17
  </div>
 
18
  <p align="center">
19
  <img src="./logo.png" alt="An illustration of a Tucano bird showing vibrant colors like yellow, orange, blue, green, and black." height="400">
20
  </p>
21
 
22
  To stimulate the future of open development of neural text generation in Portuguese, we present both **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and **[Tucano](https://huggingface.co/TucanoBR/Tucano-2b4)**, a series of decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation, are openly released on [GitHub](https://github.com/Nkluge-correa/Tucano) and Hugging Face.
23
 
24
- Read our preprint in [arXiv](https://arxiv.org/abs/2411.07854).
 
 
 
 
 
 
 
 
 
 
 
 
15
  # Tucano: Advancing Neural Text Generation for Portuguese
16
 
17
  </div>
18
+
19
  <p align="center">
20
  <img src="./logo.png" alt="An illustration of a Tucano bird showing vibrant colors like yellow, orange, blue, green, and black." height="400">
21
  </p>
22
 
23
  To stimulate the future of open development of neural text generation in Portuguese, we present both **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and **[Tucano](https://huggingface.co/TucanoBR/Tucano-2b4)**, a series of decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation, are openly released on [GitHub](https://github.com/Nkluge-correa/Tucano) and Hugging Face.
24
 
25
+ Read our preprint in [arXiv](https://arxiv.org/abs/2411.07854).
26
+
27
+ ## News
28
+
29
+ - [29/11/2024] Tucano is mentioned on Deutsche Welle: "[Cientistas criam maior banco de dados em português para IA](https://www.dw.com/pt-br/pesquisadores-da-alemanha-criam-maior-banco-de-dados-p%C3%BAblico-em-portugu%C3%AAs-para-ia/a-70917082)".
30
+ - [12/11/2024] "[Tucano: Advancing Neural Text Generation for Portuguese](https://arxiv.org/abs/2411.07854)" is published as a preprint on ArXiv, with all models and datasets released on [Hugging Face](https://huggingface.co/TucanoBR).
31
+
32
+ ## Community Contributions 🤝
33
+
34
+ - Demo on how to [run inference on Tucano](https://colab.research.google..com/drive/1Qf2DsFOFDA7RKkamI-tH3OregtOlZ8Cz).
35
+ - Demo on how to create a simple [Chat UI for Tucano](https://colab.research.google.com/drive/1fEW10CXksMfMv1veLr22OESwDs6e-W1b) using Gradio.
36
+ - [Tucano OpenVINO](https://huggingface.co/cabelo/Tucano-2b4-Instruct-fp16-ov) is a ported version of Tucano-2b4-Instruct optimized for Intel openVINO inference technology.