import gradio as gr #Get models #ASR model for input speech speech2text = gr.Interface.load("huggingface/facebook/hubert-large-ls960-ft", inputs=gr.inputs.Audio(label="Upload Audio", type="filepath", source = "upload")) #translates English to Spanish text translator = gr.Interface.load("huggingface/Helsinki-NLP/opus-mt-en-es", outputs=gr.outputs.Textbox(label="English to Spanish Translated Text")) #TTS model for output speech text2speech = gr.Interface.load("huggingface/facebook/tts_transformer-es-css10", outputs=gr.outputs.Audio(label="English to Spanish Translated Audio"), allow_flagging="never") translate = gr.Series(speech2text, translator) #outputs Spanish text translation en2es = gr.Series(translate, text2speech) #outputs Spanish audio ui = gr.Parallel(translate, en2es) #allows transcription of Spanish audio #gradio interface ui.title = "English to Spanish Speech Translator" ui.description = """
A useful tool in translating English to Spanish audio. All pre-trained models are found in huggingface.
""" ui.examples = [['ljspeech.wav'],['ljspeech2.wav'], ['longspeech.wav']] ui.allow_flagging = "never" ui.theme = "peach" ui.article = """

Pre-trained model Information

Automatic Speech Recognition

The model used for the ASR part of this space is from hubert-large-ls960-ft which is pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. This model has a self-reported word error rate (WER) of 1.9 percent and ranks first in paperswithcode for ASR on Librispeech. More information can be found on its website at hubert-self and original model is under pytorch/fairseq.

Text Translator

The English to Spanish text translator pre-trained model is from Helsinki-NLP/opus-mt-en-es which is part of the The Tatoeba Translation Challenge (v2021-08-07) as seen from its github repo at Helsinki-NLP/Tatoeba-Challenge. This project aims to develop machine translation in real-world cases for many languages.

Text to Speech

The TTS model used is from facebook/tts_transformer-es- css10. This model uses the Fairseq(-py) sequence modeling toolkit for speech synthesis, in this case, specifically TTS for Spanish. More information can be seen on their git at speech_synthesis.

""" ui.launch(inbrowser=True)