Tucano
Tucano is a series of decoder-transformers based on the Llama 2 architecture, natively pre-trained in Portuguese.
- Paper • 2411.07854 • Published • 4
TucanoBR/Tucano-2b4
Text Generation • Updated • 284 • 3Note 2.4 billion-parameter version of the Tucano series.
TucanoBR/Tucano-2b4-Instruct
Text Generation • Updated • 510 • 2Note 2.4 billion-parameter version of the Tucano fine-tuned on the TucanoBR/Tucano-SFT dataset.
TucanoBR/Tucano-1b1
Text Generation • Updated • 493Note 1.1 billion-parameter version of the Tucano series.
TucanoBR/Tucano-1b1-Instruct
Text Generation • Updated • 387 • 1Note 1.1 billion-parameter version of the Tucano fine-tuned on the TucanoBR/Tucano-SFT dataset.
TucanoBR/Tucano-630m
Text Generation • Updated • 120 • 1Note 630 million-parameter version of the Tucano series.
TucanoBR/Tucano-160m
Text Generation • Updated • 272 • 1Note 160 million-parameter version of the Tucan series.
TucanoBR/BERTimbau-large-text-filter
Text Classification • Updated • 15Note BERTimbau-large fine-tuned on the TucanoBR/GigaVerbo-Text-Filter dataset.
TucanoBR/BERTimbau-base-text-filter
Text Classification • Updated • 30Note BERTimbau-base fine-tuned on the TucanoBR/GigaVerbo-Text-Filter dataset.
TucanoBR/XGBClassifier-text-filter
UpdatedNote XGBClassifier trained on the TucanoBR/GigaVerbo-Text-Filter dataset (requires the embeddings generated by sentence-transformers/LaBSE).
TucanoBR/XGBRegressor-text-filter
UpdatedNote XGBRegressor trained on the TucanoBR/GigaVerbo-Text-Filter dataset (requires the embeddings generated by sentence-transformers/LaBSE).
TucanoBR/GigaVerbo
Viewer • Updated • 145M • 1.95k • 11Note GigaVerbo is an extensive dataset comprising 780 GB of Portuguese text, being a concatenated version of several datasets available in Hugging Face, containing over 200 billion tokens.
TucanoBR/GigaVerbo-Text-Filter
Viewer • Updated • 110k • 87Note GigaVerbo Text-Filter is a dataset with 110,000 randomly selected samples from 9 subsets of GigaVerbo, all scored by GPT-4o.
TucanoBR/Tucano-SFT
Viewer • Updated • 680k • 98Note This is the dataset used to train the "Instruct" versions of the Tucano series.
TucanoBR/lambada-pt
Viewer • Updated • 5.15k • 49 • 2Note This dataset is a translated version (Portuguese) of the LAMBADA test split as pre-processed by OpenAI.
TucanoBR/alpaca-eval-pt
Viewer • Updated • 805 • 50Note This dataset contains 805 translated samples (Portuguese) from the Alpaca dataset.
nicholasKluge/reward-aira-dataset
Viewer • Updated • 70k • 103 • 3Note This dataset contains pairs of completions to prompts. Used for DPO fine-tuning.