YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸŽ™οΈ VoxCPM2 - Full Capability Tools

Complete toolkit for openbmb/VoxCPM2 β€” a 2B parameter tokenizer-free diffusion TTS model with voice cloning, voice design, and multilingual synthesis.

Model Paper Docs


πŸ“¦ What's Included

File Purpose
voxcpm2_local_laptop.py Local inference script β€” optimized for Ryzen 7 + 16GB RAM CPU
VoxCPM2_Colab_Notebook.ipynb Google Colab notebook β€” free T4 GPU, all capabilities + Gradio UI
README.md This file β€” full documentation

πŸš€ Quick Start (Google Colab β€” FREE)

  1. Open the notebook in Colab:

  2. Run all cells top to bottom β€” installs voxcpm, downloads ~4.6GB model, then:

    • πŸ”Š Basic TTS
    • 🎨 Voice Design
    • 🌐 Multilingual (30+ languages)
    • πŸ‘€ Zero-Shot Voice Cloning (upload your voice!)
    • 🎡 Hi-Fi Ultimate Cloning
    • πŸ“‘ Streaming Generation
    • πŸ–₯️ Interactive Gradio UI with public URL
  3. GPU memory optimized for free T4 tier (~8GB VRAM used out of 16GB)


πŸ’» Local Laptop (Ryzen 7 + 16GB RAM)

Install

pip install voxcpm soundfile torch

Run All Demos

python voxcpm2_local_laptop.py --demo

Run Specific Modes

# Basic TTS
python voxcpm2_local_laptop.py --text "Hello world"

# Voice Design (natural language voice control)
python voxcpm2_local_laptop.py --mode design \
  --description "warm female voice" \
  --text "Hello there"

# Voice Cloning (needs reference WAV file)
python voxcpm2_local_laptop.py --mode clone \
  --text "This is my cloned voice" \
  --reference my_voice.wav

# Multilingual demo
python voxcpm2_local_laptop.py --mode multilingual

# Speed mode (lower timesteps = faster)
python voxcpm2_local_laptop.py --text "Hello" --timesteps 5

πŸŽ›οΈ All Capabilities Explained

1. πŸ”Š Basic TTS

Just text β†’ audio. Fastest mode. 48kHz studio-quality output.

2. 🎨 Voice Design

Control voice characteristics with natural language descriptions:

"(A young woman, gentle and soothing voice) Hello!"
"(A deep male narrator, professional tone) Welcome."
"(A robot, monotone synthetic voice) System online."

3. πŸ‘€ Zero-Shot Voice Cloning

Clone ANY voice from a 3-10 second audio sample. Upload a WAV and the model mimics the speaker perfectly.

4. 🎡 Hi-Fi Ultimate Cloning

Best quality cloning combining:

  • Prompt audio + transcript (for prosody/style)
  • Reference audio (for timbre)

5. 🌐 Multilingual (30+ Languages)

No language tags needed. Just write in the target language:

  • English, Chinese, Spanish, French, German, Japanese, Korean
  • Arabic, Hindi, Portuguese, Russian, Italian, Dutch, Polish
  • Turkish, Vietnamese, Thai, Indonesian, and more

6. πŸ“‘ Streaming Generation

Generate long texts chunk-by-chunk. Memory-efficient for audiobooks.


βš™οΈ Speed vs Quality

Timesteps Quality Speed Best For
4-5 Draft ⚑ Fast Testing
8-10 Good πŸš€ Normal Default, balanced
15-20 High 🐒 Slow Voice cloning
25-30 Best 🐌 Very Slow Audiobooks

Parameter: inference_timesteps β€” lower = faster, higher = better quality.


πŸ–₯️ Hardware Requirements

Setup VRAM/RAM Feasibility Notes
Colab T4 (Free) 16GB GPU βœ… Perfect load_denoiser=False saves ~500MB
Ryzen 7 + 16GB RAM 16GB CPU βœ… Works CPU mode, slower but functional
RTX 3060/4060 12GB GPU βœ… Good Same settings as Colab
Apple Silicon M1-M3 Unified βœ… Works device="mps"

πŸ”§ Memory Optimizations Applied

Both scripts use these settings to fit in 16GB:

  • load_denoiser=False β€” Skip ZipEnhancer (~500MB saved)
  • optimize=False on CPU β€” Skip torch.compile overhead
  • optimize=True on GPU β€” Enable torch.compile for speed
  • device="auto" / "cpu" / "cuda" β€” Proper device selection

πŸ“š Model Info

  • Parameters: 2B
  • Architecture: MiniCPM-4 β†’ LocEnc β†’ TSLM β†’ RALM β†’ LocDiT β†’ AudioVAE V2
  • Output: 48kHz WAV
  • License: Apache-2.0 (commercial use OK)
  • Paper: arXiv:2509.24650

πŸ“ License

These scripts are provided as-is for personal/educational use. The VoxCPM2 model is Apache-2.0 licensed.


πŸ”— Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for SWAG456/voxcpm2-tools