Kokoro voices β€” GGUF bundle

Per-speaker style packs for the Kokoro-82M family, converted to ggml's GGUF voice-pack format (arch=kokoro-voice, single F32 tensor voice.pack[max_phon, 1, 256]). For use with CrispStrobe/CrispASR alongside cstr/kokoro-82m-GGUF or the German backbone cstr/kokoro-de-hui-base-GGUF.

Each voicepack is ~510 KB. Loading is direct passthrough β€” no quantisation needed at this size.

Voices

File Speaker Language Source License
kokoro-voice-af_heart.gguf af_heart English (US, F) hexgrad/Kokoro-82M voices/af_heart.pt Apache-2.0
kokoro-voice-ef_dora.gguf ef_dora Spanish (F) hexgrad/Kokoro-82M voices/ef_dora.pt Apache-2.0
kokoro-voice-ff_siwis.gguf ff_siwis French (F) hexgrad/Kokoro-82M voices/ff_siwis.pt Apache-2.0
kokoro-voice-df_eva.gguf df_eva German (F) recovered from r1di/kokoro-fastapi-german Git LFS (originally Tundragoon) Apache-2.0
kokoro-voice-dm_bernd.gguf dm_bernd German (M) recovered from r1di/kokoro-fastapi-german Git LFS (originally Tundragoon) Apache-2.0
kokoro-voice-df_victoria.gguf df_victoria German (F) kikiri-tts/kikiri-german-victoria voices/victoria.pt Apache-2.0
kokoro-voice-dm_martin.gguf dm_martin German (M) kikiri-tts/kikiri-german-martin voices/martin.pt Apache-2.0

The Tundragoon voicepacks are [512, 1, 256] F32 (max_phon=512); the official Kokoro and kikiri voicepacks are [510, 1, 256] F32 (max_phon=510). The voice loader reads max_phon from the file so both layouts work transparently.

German voice cascade

When CrispASR is invoked with -l de (or any de_* / de-* locale) and no explicit --voice, it picks German voicepacks in this order:

  1. df_victoria β€” kikiri-tts, in-distribution to the dida-80b German backbone (recommended)
  2. df_eva β€” Tundragoon recovery, second-tier German speaker
  3. ff_siwis β€” French baseline, last-resort non-silence fallback

Languages without a native pack (ru, ko, ar, …) fall back to ff_siwis. See cstr/kokoro-de-hui-base-GGUF for the matching German backbone.

Quality (ASR roundtrip)

Long German phrase ("Guten Tag, dies ist ein Test des deutschen Phonemizers."), parakeet-tdt-0.6b-v3 -l de, dida-80b backbone F16:

Voice Parakeet output
dm_martin "...Phonemizers." (perfect)
df_victoria "...Tester des Deutschen Phonemizers." (1 word-boundary err)
dm_bernd "...Phonemetzers." (1 phoneme err)
df_eva "...Phonemetzes." (1 phoneme err)

All four clear the energy gate (peak β‰₯ 8000, RMS β‰₯ 1000); two are word-perfect on a phrase the official English-trained Kokoro-82M with af_heart collapses to silence on.

Quick start

huggingface-cli download cstr/kokoro-voices-GGUF kokoro-voice-af_heart.gguf --local-dir .
huggingface-cli download cstr/kokoro-82m-GGUF    kokoro-82m-q8_0.gguf       --local-dir .

./crispasr --backend kokoro \
    -m kokoro-82m-q8_0.gguf \
    --voice kokoro-voice-af_heart.gguf \
    --tts "Hello world" --tts-output hello.wav

Conversion

python models/convert-kokoro-voice-to-gguf.py \
    --input voices/af_heart.pt \
    --output kokoro-voice-af_heart.gguf

Attribution

License

Apache-2.0 across the bundle, matching every upstream source.

Downloads last month
806
GGUF
Model size
131k params
Architecture
kokoro-voice
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support