MOSS-Audio-Tokenizer-v2 — MLX int8

8-bit (int8) MLX quantization of OpenMOSS-Team/MOSS-Audio-Tokenizer-v2 — the 48 kHz stereo neural codec / vocoder used by MOSS-TTS-Local-v1.5 — for Apple Silicon via mlx-audio.

This repo only re-hosts an int8-quantized copy of OpenMOSS's codec. All design + training credit is OpenMOSS's.


Base model	OpenMOSS-Team/MOSS-Audio-Tokenizer-v2
License	Apache-2.0 (inherited)
Quantization	int8, group_size 64, affine. Linear/attention layers quantized; the conv (`WNConv1d`) projections stay full precision. Decode is bit-identical (PSNR 99 dB) to in-process int8.
Size	~2.23 GB (vs ~8.5 GB fp32 / ~3.96 GB bf16)
Pairs with	shraey/MOSS-TTS-Local-Transformer-v1.5-MLX-int8

⚠️ Loader requirement

Stock mlx-audio's MossAudioTokenizer.from_pretrained (≤ commit 412cf7c) does a strict weight load with no quantization handling and cannot load this pre-quantized codec. Use our fork, which adds it in ~15 lines by reusing mlx-audio's own apply_quantization (the standard mlx-lm pattern):

mlx-audio @ git+https://github.com/sb1992/mlx-audio@9154d5a

A PR to merge this upstream into Blaizzy/mlx-audio is pending; once merged you can use upstream mlx-audio directly. Normally you don't load this repo by hand — the paired backbone's config auto-resolves it.

Downloads last month: 4

Safetensors

Model size

0.6B params

Tensor type

F32

U32

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shraey/MOSS-Audio-Tokenizer-v2-MLX-int8

Base model

OpenMOSS-Team/MOSS-Audio-Tokenizer-v2

Finetuned

(1)

this model