SpeechT5 - a microsoft Collection

microsoft 's Collections

Phi-4

Phi-3

Phi-1

TAPEX

Table Transformer

Orca

UDOP

GIT

IFMs

SpeechT5

updated 21 days ago

The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks.

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Paper • 2110.07205 • Published Oct 14, 2021 • 5
microsoft/speecht5_tts

Text-to-Speech • Updated Nov 8, 2023 • 136k • 727

Note Text-to-speech version of SpeechT5
Runtime error

219

👩‍🎤

SpeechT5 Speech Synthesis Demo
microsoft/speecht5_vc

Audio-to-Audio • Updated Mar 22, 2023 • 1.24k • 96

Note Voice-conversion version of SpeechT5
Runtime error

96

👩‍🎤

SpeechT5 Voice Conversion Demo
microsoft/speecht5_asr

Automatic Speech Recognition • Updated Mar 22, 2023 • 4.17k • 38

Note Automatic-speech-recognition version of SpeechT5
Runtime error

36

👩‍🎤

SpeechT5 Speech Recognition Demo
microsoft/speecht5_hifigan

Updated Feb 2, 2023 • 110k • 19

Note SpeechT5 produces a spectrogram, this model converts it to a waveform