Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
microsoft 's Collections
NextCoder
Phi-4
Phi-3
Phi-1
Controllable Safety Alignment
BitNet
MAI-DS-R1
LLM2CLIP
SpeechT5
TAPEX
Table Transformer
LayoutLM
Biomedical
Orca
UDOP
GIT
Florence
IFMs
MoCapAct

SpeechT5

updated 14 days ago

The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks.

Upvote
24

  • SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

    Paper • 2110.07205 • Published Oct 14, 2021 • 5

  • microsoft/speecht5_tts

    Text-to-Speech • Updated Nov 8, 2023 • 97.5k • 765

    Note Text-to-speech version of SpeechT5


  • Runtime error
    219
    219

    SpeechT5 Speech Synthesis Demo

    👩


  • microsoft/speecht5_vc

    Audio-to-Audio • Updated Mar 22, 2023 • 2.11k • 104

    Note Voice-conversion version of SpeechT5


  • Runtime error
    96
    96

    SpeechT5 Voice Conversion Demo

    👩


  • microsoft/speecht5_asr

    Automatic Speech Recognition • Updated Mar 22, 2023 • 2.67k • 41

    Note Automatic-speech-recognition version of SpeechT5


  • Runtime error
    36
    36

    SpeechT5 Speech Recognition Demo

    👩


  • microsoft/speecht5_hifigan

    Updated Feb 2, 2023 • 88.9k • 20

    Note SpeechT5 produces a spectrogram, this model converts it to a waveform

Upvote
24
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs