MOSS-TTS-Local-Transformer-v1.5 — MLX int8

8-bit (int8) MLX quantization of OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5 — a multilingual, 48 kHz, on-device text-to-speech model — for Apple Silicon via mlx-audio.

This repo only re-hosts an int8-quantized copy of OpenMOSS's weights. All model design, training, and capabilities are OpenMOSS's work — see their card for the full details.

Base model OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5 (Qwen3-4B backbone + local transformer)
License Apache-2.0 (inherited from the base model)
Quantization int8, group_size 64, affine (mlx-audio converter)
Size ~4.55 GB (vs ~9.1 GB bf16)
Codec pairs with shraey/MOSS-Audio-Tokenizer-v2-MLX-int8 (~2.23 GB) — auto-resolved by this model's config

Use

from mlx_audio.tts import load
model = load("shraey/MOSS-TTS-Local-Transformer-v1.5-MLX-int8", lazy=True)
result = next(model.generate(text="Hello, running on device.", language="English", max_tokens=200))

Capabilities (from the base model): 31 languages + code-switching, 48 kHz stereo, transcript-optional zero-shot cloning (ref_audio), highest-fidelity "continuation" cloning (mode="continuation" + ref_text), inline [pause X.Ys], and native single-call long-form. Always pass language= when known.

Requires mlx-audio with the codec-quant-load patch so the paired int8 codec loads — mlx-audio @ git+https://github.com/sb1992/mlx-audio@9154d5a (a PR to upstream Blaizzy/mlx-audio is pending). Total download with the int8 codec ≈ 6.8 GB.

Downloads last month
-
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shraey/MOSS-TTS-Local-Transformer-v1.5-MLX-int8

Quantized
(1)
this model