arxiv:1811.06292

Towards achieving robust universal neural vocoding

Published on Nov 15, 2018

Authors:

Abstract

This paper explores the potential universality of neural vocoders. We train a WaveRNN-based vocoder on 74 speakers coming from 17 languages. This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality. When the recordings show significant changes in quality, or when moving towards non-speech vocalizations or singing, the vocoder still significantly outperforms speaker-dependent vocoders, but operates at a lower average relative MUSHRA of 75%. These results are shown to be consistent across languages, regardless of them being seen during training (e.g. English or Japanese) or unseen (e.g. Wolof, Swahili, Ahmaric).

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1811.06292 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1811.06292 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1811.06292 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.