Breeze-ASR-26 — MLX-Audio 4-bit

This is a 4-bit MLX-Audio conversion of MediaTek-Research/Breeze-ASR-26, optimized for local inference on Apple Silicon Macs, including 16GB Mac M4 machines.

The source model is a Whisper large-v2 based ASR model fine-tuned for Taiwanese Hokkien (Taigi). It transcribes Taigi speech into Mandarin Chinese character output, following the behavior of the original model.

Model Details

  • Source model: MediaTek-Research/Breeze-ASR-26
  • Base architecture: Whisper large-v2
  • Conversion tool: mlx-audio 0.4.3
  • Quantization: 4-bit affine, group size 64
  • Main weight file: model.safetensors
  • Converted model size: about 987 MB on disk, or about 942 MiB
  • License: Apache 2.0, inherited from the source model

Files

This repository uses the mlx-audio Whisper layout and includes the tokenizer / generation files needed by that stack.

file purpose
model.safetensors 4-bit MLX-Audio model weights
config.json Whisper model and quantization configuration
generation_config.json generation defaults
tokenizer files vocab.json, merges.txt, tokenizer_config.json, special_tokens_map.json, added_tokens.json, normalizer.json
preprocessor_config.json audio feature extraction settings
model.safetensors.index.json weight index metadata

Compatibility Note

This repository is intended for mlx-audio, not mlx-whisper.

There is another 4-bit MLX conversion, fredchu/breeze-asr-26-mlx-4bit, that targets the mlx-whisper style layout with a smaller file set and a weights.safetensors file. Both models are derived from MediaTek-Research/Breeze-ASR-26, but they were converted with different tooling and have different quantized weight files. Do not assume the two repositories are byte-identical or interchangeable across loaders.

Recommended Hardware

This 4-bit build is intended for practical local inference on Apple Silicon. It is the recommended variant for 16GB Mac M4 users.

For best results, close memory-heavy applications before transcribing long audio files.

Install

pip install -U mlx-audio

CLI Usage

python -m mlx_audio.stt.generate \
  --model RayyTien/Breeze-ASR-26-mlx-4bit \
  --audio audio.wav \
  --output-path output \
  --format txt

For a local checkout:

python -m mlx_audio.stt.generate \
  --model ./Breeze-ASR-26-mlx-4bit \
  --audio audio.wav \
  --output-path output \
  --format txt

Python Usage

from mlx_audio.stt.generate import generate_transcription

result = generate_transcription(
    model="RayyTien/Breeze-ASR-26-mlx-4bit",
    audio="audio.wav",
)

print(result.text)

Conversion

This model was converted with:

python -m mlx_audio.convert \
  --hf-path MediaTek-Research/Breeze-ASR-26 \
  --mlx-path Breeze-ASR-26-mlx-4bit \
  --quantize \
  --q-bits 4 \
  --model-domain stt

Limitations

Please refer to the original model card for full training data, evaluation, and limitation details. In particular, the model outputs Mandarin Chinese characters rather than native Taigi orthography, and performance can vary across accents, dialectal variation, audio quality, and specialized vocabulary.

Citation

If you use this model, please cite the original Breeze Taigi work:

@misc{lan2026breezetaigibenchmarksmodels,
      title={Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis},
      author={Yu-Siang Lan and Chia-Sheng Liu and Yi-Chang Chen and Po-Chun Hsu and Allyson Chiu and Shun-Wen Lin and Da-shan Shiu and Yuan-Fu Liao},
      year={2026},
      eprint={2603.19259},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.19259},
}
Downloads last month
56
Safetensors
Model size
0.2B params
Tensor type
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RayyTien/Breeze-ASR-26-mlx-4bit

Quantized
(6)
this model

Paper for RayyTien/Breeze-ASR-26-mlx-4bit