Instructions to use shraey/MOSS-Audio-Tokenizer-v2-MLX-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use shraey/MOSS-Audio-Tokenizer-v2-MLX-int8 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir MOSS-Audio-Tokenizer-v2-MLX-int8 shraey/MOSS-Audio-Tokenizer-v2-MLX-int8
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
MOSS-Audio-Tokenizer-v2 — MLX int8
8-bit (int8) MLX quantization of OpenMOSS-Team/MOSS-Audio-Tokenizer-v2 — the 48 kHz stereo neural codec / vocoder used by MOSS-TTS-Local-v1.5 — for Apple Silicon via mlx-audio.
This repo only re-hosts an int8-quantized copy of OpenMOSS's codec. All design + training credit is OpenMOSS's.
| Base model | OpenMOSS-Team/MOSS-Audio-Tokenizer-v2 |
| License | Apache-2.0 (inherited) |
| Quantization | int8, group_size 64, affine. Linear/attention layers quantized; the conv (WNConv1d) projections stay full precision. Decode is bit-identical (PSNR 99 dB) to in-process int8. |
| Size | ~2.23 GB (vs ~8.5 GB fp32 / ~3.96 GB bf16) |
| Pairs with | shraey/MOSS-TTS-Local-Transformer-v1.5-MLX-int8 |
⚠️ Loader requirement
Stock mlx-audio's MossAudioTokenizer.from_pretrained (≤ commit 412cf7c) does a strict weight load with no quantization handling and cannot load this pre-quantized codec. Use our fork, which adds it in ~15 lines by reusing mlx-audio's own apply_quantization (the standard mlx-lm pattern):
mlx-audio @ git+https://github.com/sb1992/mlx-audio@9154d5a
A PR to merge this upstream into Blaizzy/mlx-audio is pending; once merged you can use upstream mlx-audio directly. Normally you don't load this repo by hand — the paired backbone's config auto-resolves it.
- Downloads last month
- 4
Quantized
Model tree for shraey/MOSS-Audio-Tokenizer-v2-MLX-int8
Base model
OpenMOSS-Team/MOSS-Audio-Tokenizer-v2