You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Mai Linh - Vietnamese TTS (Piper/VITS) | CoreML | ONNX

Giọng Mai Linh tiếng Việt, chạy on-device. Hai dạng:

MaiLinh.mlpackage - CoreML cho app iOS/macOS native (fp32, 22050Hz).
Mai Linh/mailinh250626.onnx + .onnx.json — ONNX chạy bằng onnxruntime trên PC/Mac.

Credit: voice từ quangdung/Piper_checkpoint

File trong repo

File	Mô tả
`MaiLinh.mlpackage`	Model CoreML (fp32, L=256 phoneme, M=500 ≈ 5.8s/câu)
`MaiLinhTTS.swift`	Code Swift tích hợp iOS: load + chạy + cắt + fade + phát
`Mai Linh/mailinh250626.onnx` + `.onnx.json`	Model ONNX + config (chứa `phoneme_id_map`)
`Mai Linh/mailinh_7BKaqqkh.mp3`	Audio mẫu giọng Mai Linh

Tốc độ: CoreML vs ONNX

Đo trên Mac (Apple Silicon), cùng phoneme ids, chỉ tính phần model:

Câu	ONNX CPU (4-thread)	CoreML CPU-only	CoreML GPU
Ngắn (~1.8s)	28 ms	46 ms	26 ms
Dài (~5.0s)	104 ms	46 ms	26 ms

So cùng backend (CPU vs CPU): CoreML tốn ~46ms cố định (shape tĩnh, luôn tính đủ M frame); ONNX co giãn theo độ dài. → câu ngắn ONNX nhanh hơn, câu dài CoreML nhanh hơn. Không bên nào thắng tuyệt đối.
CoreML thêm cửa GPU (~26ms) mà ONNX không dùng được trên Mac (CoreML-EP của onnxruntime crash với model này) — lợi thế deployment, không phải format nhanh hơn.
Cả ba đều thừa realtime. Chất lượng CoreML khớp PyTorch gốc, mean|diff| ≈ 3.7e-7 (fp32). Số trên iPhone sẽ khác.

Chạy bằng ONNX (PC/Mac)

pip install piper-tts
echo "Xin chào các bạn" | piper \
  --model "Mai Linh/mailinh250626.onnx" \
  --config "Mai Linh/mailinh250626.onnx.json" \
  --output_file out.wav

ONNX dynamic, không giới hạn độ dài câu; đặt 4 thread (OMP_NUM_THREADS=4) cho ~52x realtime.

Chạy CoreML bằng Python (thử nhanh trên Mac)

import wave, numpy as np, coremltools as ct
from piper import PiperVoice          # pip install piper-tts coremltools

L, SR, HOP = 256, 22050, 256
v  = PiperVoice.load("Mai Linh/mailinh250626.onnx", "Mai Linh/mailinh250626.onnx.json")  # chỉ để phonemize
ml = ct.models.MLModel("MaiLinh.mlpackage")

ids = v.phonemes_to_ids(v.phonemize("Xin chào, tôi là Mai Linh.")[0])
n = min(len(ids), L)
arr = np.zeros((1, L), np.int32); arr[0, :n] = ids[:n]

out = ml.predict({"input": arr,
                  "input_lengths": np.array([n], np.int32),
                  "scales": np.array([0.667, 1.0, 0.8], np.float32)})
audio = np.asarray(out["audio"]).reshape(-1)
nf = int(np.asarray(out["n_frames"]).reshape(-1)[0])
audio = audio[:nf * HOP].copy()                         # cắt đúng độ dài thật
f = min(int(0.008 * SR), len(audio))                    # fade-out 8ms (bỏ tiếng "ụp")
audio[-f:] *= 0.5 * (1 + np.cos(np.linspace(0, np.pi, f)))

with wave.open("out.wav", "wb") as w:
    w.setnchannels(1); w.setsampwidth(2); w.setframerate(SR)
    w.writeframes((np.clip(audio, -1, 1) * 32767).astype(np.int16).tobytes())

CoreML - I/O contract

Input

input : [1, 256] Int32 - phoneme ids, pad 0 cho đủ 256.
input_lengths : [1] Int32 - số phoneme id thật (≤256).
scales : [3] Float32 - [noise=0.667, length=1.0, noise_w=0.8]; length>1 đọc chậm hơn.

Output

audio : Float32 mono 22050Hz, độ dài cố định (M·256 mẫu).
n_frames : Int32 - số frame thật. Cắt audio tại n_frames*256 rồi fade-out ngắn, cắt đươc noise ở cuối audio.

Tích hợp iOS

Kéo MaiLinh.mlpackage vào Xcode (tự biên dịch ra .mlmodelc khi build).
Dùng MaiLinhTTS.swift đã làm sẵn load model, dựng input, chạy, cắt theo n_frames, fade-out, phát qua AVAudioEngine.
Phần phải tự làm: phonemize tiếng Việt. Model nhận phoneme ids, không nhận text. Cần espeak-ng (voice vi) build cho iOS (arm64) đổi text → IPA, rồi map qua phoneme_id_map trong mailinh250626.onnx.json, chèn pad id 0 giữa các phoneme (logic phonemes_to_ids của piper).
Câu dài > 5.8s: tách câu, tổng hợp từng câu.

ANE compile fail nên model chạy GPU/CPU (vẫn CoreML native), đủ nhanh (~26ms/câu trên Mac).

Credits

Piper TTS: OHF-Voice/piper1-gpl — Open Home Foundation
Base checkpoint: rhasspy/piper-checkpoints

Downloads last month: -

Model tree for beyoru/MaiLinh-TTS-CoreML

Base model

quangdung/Piper_checkpoint

Quantized

(1)

this model