Instructions to use RayyTien/Breeze-ASR-26-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use RayyTien/Breeze-ASR-26-mlx-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Breeze-ASR-26-mlx-4bit RayyTien/Breeze-ASR-26-mlx-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Breeze-ASR-26 — MLX-Audio 4-bit
This is a 4-bit MLX-Audio conversion of MediaTek-Research/Breeze-ASR-26, optimized for local inference on Apple Silicon Macs, including 16GB Mac M4 machines.
The source model is a Whisper large-v2 based ASR model fine-tuned for Taiwanese Hokkien (Taigi). It transcribes Taigi speech into Mandarin Chinese character output, following the behavior of the original model.
Model Details
- Source model:
MediaTek-Research/Breeze-ASR-26 - Base architecture: Whisper large-v2
- Conversion tool:
mlx-audio 0.4.3 - Quantization: 4-bit affine, group size 64
- Main weight file:
model.safetensors - Converted model size: about 987 MB on disk, or about 942 MiB
- License: Apache 2.0, inherited from the source model
Files
This repository uses the mlx-audio Whisper layout and includes the tokenizer / generation files needed by that stack.
| file | purpose |
|---|---|
model.safetensors |
4-bit MLX-Audio model weights |
config.json |
Whisper model and quantization configuration |
generation_config.json |
generation defaults |
| tokenizer files | vocab.json, merges.txt, tokenizer_config.json, special_tokens_map.json, added_tokens.json, normalizer.json |
preprocessor_config.json |
audio feature extraction settings |
model.safetensors.index.json |
weight index metadata |
Compatibility Note
This repository is intended for mlx-audio, not mlx-whisper.
There is another 4-bit MLX conversion, fredchu/breeze-asr-26-mlx-4bit, that targets the mlx-whisper style layout with a smaller file set and a weights.safetensors file. Both models are derived from MediaTek-Research/Breeze-ASR-26, but they were converted with different tooling and have different quantized weight files. Do not assume the two repositories are byte-identical or interchangeable across loaders.
Recommended Hardware
This 4-bit build is intended for practical local inference on Apple Silicon. It is the recommended variant for 16GB Mac M4 users.
For best results, close memory-heavy applications before transcribing long audio files.
Install
pip install -U mlx-audio
CLI Usage
python -m mlx_audio.stt.generate \
--model RayyTien/Breeze-ASR-26-mlx-4bit \
--audio audio.wav \
--output-path output \
--format txt
For a local checkout:
python -m mlx_audio.stt.generate \
--model ./Breeze-ASR-26-mlx-4bit \
--audio audio.wav \
--output-path output \
--format txt
Python Usage
from mlx_audio.stt.generate import generate_transcription
result = generate_transcription(
model="RayyTien/Breeze-ASR-26-mlx-4bit",
audio="audio.wav",
)
print(result.text)
Conversion
This model was converted with:
python -m mlx_audio.convert \
--hf-path MediaTek-Research/Breeze-ASR-26 \
--mlx-path Breeze-ASR-26-mlx-4bit \
--quantize \
--q-bits 4 \
--model-domain stt
Limitations
Please refer to the original model card for full training data, evaluation, and limitation details. In particular, the model outputs Mandarin Chinese characters rather than native Taigi orthography, and performance can vary across accents, dialectal variation, audio quality, and specialized vocabulary.
Citation
If you use this model, please cite the original Breeze Taigi work:
@misc{lan2026breezetaigibenchmarksmodels,
title={Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis},
author={Yu-Siang Lan and Chia-Sheng Liu and Yi-Chang Chen and Po-Chun Hsu and Allyson Chiu and Shun-Wen Lin and Da-shan Shiu and Yuan-Fu Liao},
year={2026},
eprint={2603.19259},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.19259},
}
- Downloads last month
- 56
4-bit
Model tree for RayyTien/Breeze-ASR-26-mlx-4bit
Base model
openai/whisper-large-v2