Tucano2-qwen-3.7B-Instruct (MLX 4-bit)

This is a 4-bit quantized MLX version of Polygl0t/Tucano2-qwen-3.7B-Instruct, optimized for efficient on-device inference on Apple Silicon.


Este é uma versão quantizada em 4-bit no formato MLX do modelo Polygl0t/Tucano2-qwen-3.7B-Instruct, otimizada para inferência eficiente em dispositivos Apple Silicon.


Model Summary

Tucano2 is a family of open-source Portuguese language models developed by Polygl0t. This Instruct variant has been fine-tuned for chat and instruction-following tasks using Supervised Fine-Tuning (SFT) and Anchored Preference Optimization (APO). The 3.7B is the largest model in the Tucano2 family and achieves the highest scores across all benchmarks.

Detail Value
Original Model Polygl0t/Tucano2-qwen-3.7B-Instruct
Base Architecture Qwen3-4B
Parameters 3.76B
Context Length 4,096 tokens
Quantization 4-bit (MLX)
License Apache 2.0

Resumo do Modelo

Tucano2 é uma família de modelos de linguagem de código aberto para o português, desenvolvida pelo Polygl0t. Esta variante Instruct foi ajustada para tarefas de chat e seguimento de instruções usando SFT (Supervised Fine-Tuning) e APO (Anchored Preference Optimization). O modelo 3.7B é o maior da família Tucano2 e alcança as melhores pontuações em todos os benchmarks.

Benchmark Results

Benchmark Score
ENEM 72.92%
BLUEX 64.53%
OAB Exams 54.31%
ARC Challenge 60.34%
BELEBELE 85.22%
MMLU 64.64%
GSM8K-PT 53.81%
IFEval-PT 41.67%
HumanEval 47.56%
Total Avg. 53.64

Usage

Python (mlx-lm)

from mlx_lm import load, generate

model, tokenizer = load("pessini/Tucano2-qwen-3.7B-Instruct-MLX-4bit")

messages = [{"role": "user", "content": "Qual é a capital do Brasil?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)

Citation

@misc{correa2026tucano2cool,
  title={{Tucano 2 Cool: Better Open Source LLMs for Portuguese}},
  author={Corrêa, Nicholas Kluge and Sen, Aniket and Fatimah, Shiza and Falk, Sophia and Landgraf, Lennard and Kastner, Julia and Flek, Lucie},
  year={2026},
  eprint={2603.03543},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.03543}
}

Links

Downloads last month
774
Safetensors
Model size
0.6B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pessini/Tucano2-qwen-3.7B-Instruct-MLX-4bit

Quantized
(1)
this model

Paper for pessini/Tucano2-qwen-3.7B-Instruct-MLX-4bit