Instructions to use pessini/Tucano2-qwen-3.7B-Instruct-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use pessini/Tucano2-qwen-3.7B-Instruct-MLX-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Tucano2-qwen-3.7B-Instruct-MLX-4bit pessini/Tucano2-qwen-3.7B-Instruct-MLX-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Tucano2-qwen-3.7B-Instruct (MLX 4-bit)
This is a 4-bit quantized MLX version of Polygl0t/Tucano2-qwen-3.7B-Instruct, optimized for efficient on-device inference on Apple Silicon.
Este é uma versão quantizada em 4-bit no formato MLX do modelo Polygl0t/Tucano2-qwen-3.7B-Instruct, otimizada para inferência eficiente em dispositivos Apple Silicon.
Model Summary
Tucano2 is a family of open-source Portuguese language models developed by Polygl0t. This Instruct variant has been fine-tuned for chat and instruction-following tasks using Supervised Fine-Tuning (SFT) and Anchored Preference Optimization (APO). The 3.7B is the largest model in the Tucano2 family and achieves the highest scores across all benchmarks.
| Detail | Value |
|---|---|
| Original Model | Polygl0t/Tucano2-qwen-3.7B-Instruct |
| Base Architecture | Qwen3-4B |
| Parameters | 3.76B |
| Context Length | 4,096 tokens |
| Quantization | 4-bit (MLX) |
| License | Apache 2.0 |
Resumo do Modelo
Tucano2 é uma família de modelos de linguagem de código aberto para o português, desenvolvida pelo Polygl0t. Esta variante Instruct foi ajustada para tarefas de chat e seguimento de instruções usando SFT (Supervised Fine-Tuning) e APO (Anchored Preference Optimization). O modelo 3.7B é o maior da família Tucano2 e alcança as melhores pontuações em todos os benchmarks.
Benchmark Results
| Benchmark | Score |
|---|---|
| ENEM | 72.92% |
| BLUEX | 64.53% |
| OAB Exams | 54.31% |
| ARC Challenge | 60.34% |
| BELEBELE | 85.22% |
| MMLU | 64.64% |
| GSM8K-PT | 53.81% |
| IFEval-PT | 41.67% |
| HumanEval | 47.56% |
| Total Avg. | 53.64 |
Usage
Python (mlx-lm)
from mlx_lm import load, generate
model, tokenizer = load("pessini/Tucano2-qwen-3.7B-Instruct-MLX-4bit")
messages = [{"role": "user", "content": "Qual é a capital do Brasil?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)
Citation
@misc{correa2026tucano2cool,
title={{Tucano 2 Cool: Better Open Source LLMs for Portuguese}},
author={Corrêa, Nicholas Kluge and Sen, Aniket and Fatimah, Shiza and Falk, Sophia and Landgraf, Lennard and Kastner, Julia and Flek, Lucie},
year={2026},
eprint={2603.03543},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.03543}
}
Links
- Original Model: Polygl0t/Tucano2-qwen-3.7B-Instruct
- Paper: arXiv:2603.03543
- Training Code: Polygl0t/llm-foundry
- Downloads last month
- 774
4-bit
Model tree for pessini/Tucano2-qwen-3.7B-Instruct-MLX-4bit
Base model
Qwen/Qwen3-4B-Base