import gradio as gr #Get models #ASR model for input speech speech2text = gr.Interface.load("huggingface/facebook/hubert-large-ls960-ft", inputs=gr.inputs.Audio(label="Record Audio File", type="file", source = "microphone")) #translates english to spanish text translator = gr.Interface.load("huggingface/Helsinki-NLP/opus-mt-en-es", outputs=gr.outputs.Textbox(label="English to Spanish Translated Text")) #TTS model for output speech text2speech = gr.Interface.load("huggingface/facebook/tts_transformer-es-css10", outputs=gr.outputs.Audio(label="English to Spanish Translated Audio"), allow_flagging="never") translate = gr.Series(speech2text, translator) #outputs Spanish text translation en2es = gr.Series(translate, text2speech) #outputs Spanish audio ui = gr.Parallel(translate, en2es) #allows transcription of Spanish audio #gradio interface ui.title = "English to Spanish Speech Translator" ui.description = """
A useful tool in translating English to Spanish audio. All pre-trained models are found in huggingface.
""" ui.examples = [['ljspeech.wav'],['ljspeech2.wav',]] ui.theme = "peach" ui.article = """

Pre-trained model Information

Automatic Speech Recognition

The model used for the ASR part of this space is from [https://huggingface.co/facebook/hubert-large-ls960-ft] which is pretrained and fine-tuned on 960 hours of Librispeech on 16kHz sampled speech audio. This model has a self-reported word error rate (WER) of 1.9 percent and ranks first in paperswithcode for ASR on Librispeech. More information can be found on its website at [https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition- generation-and-compression] and original model is under [https://github.com/pytorch/fairseq/tree/main/examples/hubert].

Text Translator

The English to Spanish text translator pre-trained model is from [https://huggingface.co/Helsinki-NLP/opus-mt-en-es] which is part of the The Tatoeba Translation Challenge (v2021-08-07) as seen from its github repo at [https://github.com/Helsinki-NLP/Tatoeba-Challenge]. This project aims to develop machine translation in real-world cases for many languages.

Text to Speech

The TTS model used is from [https://huggingface.co/facebook/tts_transformer-es-css10]. This model uses the Fairseq(-py) sequence modeling toolkit for speech synthesis, in this case, specifically TTS for Spanish. More information can be seen on their git at [https://github.com/pytorch/fairseq/tree/main/examples/speech_synthesis].

""" ui.launch(inbrowser=True)