MOSS-Audio-Tokenizer-v2 — MLX int8

8-bit (int8) MLX quantization of OpenMOSS-Team/MOSS-Audio-Tokenizer-v2 — the 48 kHz stereo neural codec / vocoder used by MOSS-TTS-Local-v1.5 — for Apple Silicon via mlx-audio.

This repo only re-hosts an int8-quantized copy of OpenMOSS's codec. All design + training credit is OpenMOSS's.

Base model OpenMOSS-Team/MOSS-Audio-Tokenizer-v2
License Apache-2.0 (inherited)
Quantization int8, group_size 64, affine. Linear/attention layers quantized; the conv (WNConv1d) projections stay full precision. Decode is bit-identical (PSNR 99 dB) to in-process int8.
Size ~2.23 GB (vs ~8.5 GB fp32 / ~3.96 GB bf16)
Pairs with shraey/MOSS-TTS-Local-Transformer-v1.5-MLX-int8

⚠️ Loader requirement

Stock mlx-audio's MossAudioTokenizer.from_pretrained (≤ commit 412cf7c) does a strict weight load with no quantization handling and cannot load this pre-quantized codec. Use our fork, which adds it in ~15 lines by reusing mlx-audio's own apply_quantization (the standard mlx-lm pattern):

mlx-audio @ git+https://github.com/sb1992/mlx-audio@9154d5a

A PR to merge this upstream into Blaizzy/mlx-audio is pending; once merged you can use upstream mlx-audio directly. Normally you don't load this repo by hand — the paired backbone's config auto-resolves it.

Downloads last month
4
Safetensors
Model size
0.6B params
Tensor type
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shraey/MOSS-Audio-Tokenizer-v2-MLX-int8

Finetuned
(1)
this model