Kokoro 1.1-zh [ONNX]

Exported from hexgrad/Kokoro-82M-v1.1-zh.

Notes

The speed input was changed to use float32; it is int64 in the original export script. To replicate this, clone https://github.com/hexgrad/kokoro, apply onnx_exporter.patch to the repository, and run examples/export.py.

Voice files were converted from PyTorch format to HDF5 with voice_pt_to_h5.py.

Usage

import soundfile as sf  # type: ignore
import h5py  # type: ignore
import onnxruntime as ort  # type: ignore
import numpy as np


if __name__ == "__main__":
    tokens: list[int] = [0, 81, 83, 16, 62, 156, 51, 133, 83, 123, 16, 50, 157, 63, 16, 102, 68, 16, 102, 56, 46, 156, 51, 46, 16, 65, 156, 25, 68, 16, 46, 156, 138, 68, 16, 56, 157, 69, 62, 16, 44, 156, 102, 46, 16, 52, 63, 16, 62, 135, 16, 156, 86, 56, 62, 83, 123, 16, 81, 83, 16, 50, 156, 39, 61, 16, 138, 64, 16, 50, 102, 68, 16, 65, 156, 102, 68, 46, 83, 55, 16, 44, 157, 138, 62, 16, 123, 156, 72, 81, 83, 123, 16, 54, 156, 51, 46, 68, 16, 52, 63, 16, 62, 83, 16, 81, 83, 16, 119, 123, 156, 86, 131, 50, 157, 31, 54, 46, 16, 138, 64, 16, 52, 135, 123, 16, 55, 156, 25, 56, 46, 4, 0]
    voice: str = "af_sol"
    speed: float = 1.0

    model_session: ort.InferenceSession = ort.InferenceSession("Kokoro-1.1-zh-FP32.onnx")

    with h5py.File(f"voices/{voice}.h5", mode="r") as file:
        dataset: np.ndarray = np.array(file[str(len(tokens) - 2)])  # type: ignore

    waveform, duration = model_session.run(  # type: ignore
        None,
        {
            "input_ids": np.array(tokens).reshape(1, -1),
            "style": dataset.reshape(1, -1),
            "speed": np.array([speed], dtype=np.float32),
        },
    )
    sf.write("output.wav", waveform, 24000)  # type: ignore

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for alexisStacksCode/Kokoro-1.1-zh-ONNX

Base model

yl4579/StyleTTS2-LJSpeech

Finetuned

hexgrad/Kokoro-82M

Finetuned

hexgrad/Kokoro-82M-v1.1-zh

Quantized

(3)

this model