Kokoro Overview
Description:
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost‑efficient. Kokoro can be deployed anywhere from production environments to personal projects.
Kokoro was developed by hexgrad.
This model is ready for commercial/non-commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA hexgrad Model Card
License/Terms of Use:
Deployment Geography:
Global
Use Case:
Developers and enterprises building text‑to‑speech applications, voice assistants, and audio generation services. Suitable for any domain that requires high‑quality, low‑latency speech synthesis, from production APIs to personal projects.
Release Date:
HuggingFace: 05/29/2026 via [URL]
Reference(s):
Model Architecture:
Architecture Type: Transformer
Network Architecture: StyleTTS 2, ISTFTNet, Decoder only
This model was developed based on yl4579/StyleTTS2-LJSpeech.
Number of model parameters: 82M (8.2*10^7)
Input:
Input Type(s): Text
Input Format(s): String
Input Parameters: One-Dimensional (1D)
Other properties related to input:
Input Length: max length ~500 tokens, recommend to split input into chunks 100-200 tokens long
Input Language: English - full support, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, Brazilian Portugese - partial support
Output:
Output Type(s): Audio
Output Format: Audio (.wav, .mp3)
Output Parameters: One-Dimensional (1D)
Other Properties Related to Output: Audio output duration is approximately one minute per 1,000 characters of input text.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engine(s):
- ONNXRuntime win-x64-gpu_cuda13-1.24.3 Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere
- NVIDIA Blackwell
- NVIDIA Lovelace
- NVIDIA Turing [Preferred/Supported] Operating System(s): Windows 10/11
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
This AI model can be embedded as an Application Programming Interface (API) call into the software environment described above.
Model Version(s):
v1.0
Training, Testing, and Evaluation Datasets:
Training Dataset:
Link: Undisclosed
Data Modality: Audio
Audio Training Data Size: Less than 10,000 Hours
Data Collection Method by dataset: Hybrid: Automated, Synthetic
Labeling Method by dataset: Automated
Properties (Quantity, Dataset Descriptions, Sensor(s)): Kokoro was trained exclusively on permissive, non‑copyrighted audio data and IPA phoneme labels. The dataset comprises public‑domain recordings, audio released under permissive licenses, and synthetic audio generated by closed‑source TTS models. Overall, the training corpus amounts to a few hundred hours of audio.
Testing Dataset:
Link: Undisclosed
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties (Quantity, Dataset Descriptions, Sensor(s)): Undisclosed
Evaluation Dataset:
Link: Undisclosed
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties (Quantity, Dataset Descriptions, Sensor(s)): Undisclosed
Inference:
Acceleration Engine:
- TensorRT
- CUDA
- CoreML
- Xnnpack
- Nnapi
- DirectML
Test Hardware:
- NVIDIA GeForce RTX 4090
- NVIDIA GeForce RTX 3070 Ti
- NVIDIA GeForce RTX 2060
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or concerns here.