Breeze-ASR-26 — MLX-Audio 4-bit

This is a 4-bit MLX-Audio conversion of MediaTek-Research/Breeze-ASR-26, optimized for local inference on Apple Silicon Macs, including 16GB Mac M4 machines.

The source model is a Whisper large-v2 based ASR model fine-tuned for Taiwanese Hokkien (Taigi). It transcribes Taigi speech into Mandarin Chinese character output, following the behavior of the original model.

Model Details

Source model: MediaTek-Research/Breeze-ASR-26
Base architecture: Whisper large-v2
Conversion tool: mlx-audio 0.4.3
Quantization: 4-bit affine, group size 64
Main weight file: model.safetensors
Converted model size: about 987 MB on disk, or about 942 MiB
License: Apache 2.0, inherited from the source model

Files

This repository uses the mlx-audio Whisper layout and includes the tokenizer / generation files needed by that stack.

file	purpose
`model.safetensors`	4-bit MLX-Audio model weights
`config.json`	Whisper model and quantization configuration
`generation_config.json`	generation defaults
tokenizer files	`vocab.json`, `merges.txt`, `tokenizer_config.json`, `special_tokens_map.json`, `added_tokens.json`, `normalizer.json`
`preprocessor_config.json`	audio feature extraction settings
`model.safetensors.index.json`	weight index metadata

Compatibility Note

This repository is intended for mlx-audio, not mlx-whisper.

There is another 4-bit MLX conversion, fredchu/breeze-asr-26-mlx-4bit, that targets the mlx-whisper style layout with a smaller file set and a weights.safetensors file. Both models are derived from MediaTek-Research/Breeze-ASR-26, but they were converted with different tooling and have different quantized weight files. Do not assume the two repositories are byte-identical or interchangeable across loaders.

Recommended Hardware

This 4-bit build is intended for practical local inference on Apple Silicon. It is the recommended variant for 16GB Mac M4 users.

For best results, close memory-heavy applications before transcribing long audio files.

Install

pip install -U mlx-audio

CLI Usage

python -m mlx_audio.stt.generate \
  --model RayyTien/Breeze-ASR-26-mlx-4bit \
  --audio audio.wav \
  --output-path output \
  --format txt

For a local checkout:

python -m mlx_audio.stt.generate \
  --model ./Breeze-ASR-26-mlx-4bit \
  --audio audio.wav \
  --output-path output \
  --format txt

Python Usage

from mlx_audio.stt.generate import generate_transcription

result = generate_transcription(
    model="RayyTien/Breeze-ASR-26-mlx-4bit",
    audio="audio.wav",
)

print(result.text)

Conversion

This model was converted with:

python -m mlx_audio.convert \
  --hf-path MediaTek-Research/Breeze-ASR-26 \
  --mlx-path Breeze-ASR-26-mlx-4bit \
  --quantize \
  --q-bits 4 \
  --model-domain stt

Limitations

Please refer to the original model card for full training data, evaluation, and limitation details. In particular, the model outputs Mandarin Chinese characters rather than native Taigi orthography, and performance can vary across accents, dialectal variation, audio quality, and specialized vocabulary.

Citation

If you use this model, please cite the original Breeze Taigi work:

@misc{lan2026breezetaigibenchmarksmodels,
      title={Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis},
      author={Yu-Siang Lan and Chia-Sheng Liu and Yi-Chang Chen and Po-Chun Hsu and Allyson Chiu and Shun-Wen Lin and Da-shan Shiu and Yuan-Fu Liao},
      year={2026},
      eprint={2603.19259},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.19259},
}

Downloads last month: 56

Safetensors

Model size

0.2B params

Tensor type

F32

U32

MLX

Hardware compatibility

4-bit

Model tree for RayyTien/Breeze-ASR-26-mlx-4bit

Base model

openai/whisper-large-v2

Finetuned

MediaTek-Research/Breeze-ASR-26

Quantized

(6)

this model

Paper for RayyTien/Breeze-ASR-26-mlx-4bit

Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis

Paper • 2603.19259 • Published Feb 26 • 2