Instructions to use aufklarer/VoxCPM2-MLX-bf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use aufklarer/VoxCPM2-MLX-bf16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir VoxCPM2-MLX-bf16 aufklarer/VoxCPM2-MLX-bf16
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
VoxCPM2 β MLX bf16
Full-precision MLX port for Apple Silicon.
MLX port of openbmb/VoxCPM2 β a 2B-parameter multilingual diffusion-autoregressive TTS model with 48 kHz studio-quality output, voice cloning, and instruction-driven voice design.
Part of soniqo.audio β an on-device speech toolkit for
Apple Silicon. Consumed by the open-source
speech-swift library
(module VoxCPM2TTS).
Bundle size: 4.96 GB
Use cases
- Speech generation β 48 kHz TTS with voice design and multilingual support.
- Voice cloning β reference-audio cloning + ultimate cloning (audio + transcript).
- CLI reference β
speech speak --engine voxcpm2 ...flags. - Getting started β install
speech-swifton macOS / iOS.
Variants
| Variant | Size | Notes |
|---|---|---|
| bf16 | ~5.0 GB | Reference quality, no Linear quantization. |
| int8 | ~3.0 GB | 8-bit group quantization. Mean rel-L2 0.53 % vs bf16. |
| int4 | ~1.9 GB | 4-bit group quantization. Mean rel-L2 9.04 % vs bf16. |
Capabilities
- 30 languages including English, Chinese, Indonesian, Japanese, Korean
- 48 kHz output
- Zero-shot synthesis β generate speech from text alone
- Voice cloning β clone a target speaker from a single reference clip
- Voice design β natural-language style control (e.g. "young female voice, warm and gentle")
- Ultimate cloning β reference audio + transcript for prosody-preserving cloning
- Streaming generation β patch-level decoding for low-latency synthesis
Precision
No quantization. All Linear weights stored as bfloat16. Use this
variant for reference quality or when memory is not a constraint.
Usage with speech-swift
This bundle is consumed by soniqo/speech-swift's
VoxCPM2TTS Swift module.
import VoxCPM2TTS
let model = try await VoxCPM2TTSModel.fromPretrained(
modelId: "aufklarer/VoxCPM2-MLX-bf16"
)
let audio = try await model.generate(text: "Hello from VoxCPM2.", language: "english")
Or via the CLI:
speech speak "Hello from VoxCPM2." --engine voxcpm2 --voxcpm2-variant bf16 -o hi.wav
Source
This bundle is converted from the upstream PyTorch weights at openbmb/VoxCPM2.
License
Apache 2.0 β inherited from the upstream openbmb/VoxCPM2 model.
Responsible use
Voice cloning capability is included. Users are responsible for obtaining consent for any voice that is cloned and for not using the model to impersonate individuals without their permission, generate disinformation, or commit fraud.
- Downloads last month
- 80
Quantized
Model tree for aufklarer/VoxCPM2-MLX-bf16
Base model
openbmb/VoxCPM2