MOSS-TTS-Local-Transformer-v1.5 — MLX int8

8-bit (int8) MLX quantization of OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5 — a multilingual, 48 kHz, on-device text-to-speech model — for Apple Silicon via mlx-audio.

This repo only re-hosts an int8-quantized copy of OpenMOSS's weights. All model design, training, and capabilities are OpenMOSS's work — see their card for the full details.


Base model	OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5 (Qwen3-4B backbone + local transformer)
License	Apache-2.0 (inherited from the base model)
Quantization	int8, group_size 64, affine (mlx-audio converter)
Size	~4.55 GB (vs ~9.1 GB bf16)
Codec	pairs with shraey/MOSS-Audio-Tokenizer-v2-MLX-int8 (~2.23 GB) — auto-resolved by this model's config

Use

from mlx_audio.tts import load
model = load("shraey/MOSS-TTS-Local-Transformer-v1.5-MLX-int8", lazy=True)
result = next(model.generate(text="Hello, running on device.", language="English", max_tokens=200))

Capabilities (from the base model): 31 languages + code-switching, 48 kHz stereo, transcript-optional zero-shot cloning (ref_audio), highest-fidelity "continuation" cloning (mode="continuation" + ref_text), inline [pause X.Ys], and native single-call long-form. Always pass language= when known.

Requires mlx-audio with the codec-quant-load patch so the paired int8 codec loads — mlx-audio @ git+https://github.com/sb1992/mlx-audio@9154d5a (a PR to upstream Blaizzy/mlx-audio is pending). Total download with the int8 codec ≈ 6.8 GB.

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for shraey/MOSS-TTS-Local-Transformer-v1.5-MLX-int8

Base model

OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5

Quantized

(1)

this model