CosyVoice3 SG

This repository contains the llm.pt checkpoint fine-tuned on the Singapore Mandarin subset, as presented in the paper Joycent: Diffusion-based Accent TTS without Accented Phone Prediction.

The remaining CosyVoice3 components are loaded from FunAudioLLM/Fun-CosyVoice3-0.5B-2512. The checkpoint is intended to replace the base model's llm.pt; it does not include flow.pt, hift.pt, tokenizer, or ONNX files.

Project Resources

Paper: Joycent: Diffusion-based Accent TTS without Accented Phone Prediction
Code: oshindow/Joycent-code
Demo: Joycent Project Page

Inference

The inference wrapper for this model is available in the Joycent project as joycent/inference_cosyvoice.py.

Citation

If you find this work useful, please cite:

@misc{wang2026joycentdiffusionbasedaccenttts,
      title={Joycent: Diffusion-based Accent TTS without Accented Phone Prediction},
      author={Xintong Wang and Ye Wang},
      year={2026},
      eprint={2606.16417},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
}

Downloads last month: 35

Model tree for walston/cosyvoice3-sg

Base model

FunAudioLLM/Fun-CosyVoice3-0.5B-2512

Finetuned

(12)

this model

Space using walston/cosyvoice3-sg 1

Paper for walston/cosyvoice3-sg

Joycent: Diffusion-based Accent TTS without Accented Phone Prediction

Paper • 2606.16417 • Published 4 days ago