Generate-Gender-Neutralized-Audios

Build error

App Files Files Community

Libra7578

julien-c HF staff commited on Apr 13, 2023

Commit

cdc4ccc

•

0 Parent(s):

Duplicate from CVMX-jaca-tonos/Generate-Gender-Neutralized-Audios

Browse files

Co-authored-by: Julien Chaumond <julien-c@users.noreply.huggingface.co>

Files changed (13) hide show

.gitattributes +27 -0
Example1.wav +0 -0
Example2.wav +0 -0
Example3.wav +0 -0
README.md +13 -0
app.py +155 -0
audio1.wav +0 -0
example2.wav +0 -0
example3.wav +0 -0
packages.txt +2 -0
requirements.txt +6 -0
travel.mp3 +0 -0
travel.wav +0 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,27 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zstandard filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

Example1.wav ADDED Viewed

Binary file (479 kB). View file

Example2.wav ADDED Viewed

Binary file (250 kB). View file

Example3.wav ADDED Viewed

Binary file (834 kB). View file

README.md ADDED Viewed

	@@ -0,0 +1,13 @@

+---
+title: Generate Gender Neutralized Audios
+emoji: 🦀
+colorFrom: pink
+colorTo: red
+sdk: gradio
+sdk_version: 2.9.4
+app_file: app.py
+pinned: false
+duplicated_from: CVMX-jaca-tonos/Generate-Gender-Neutralized-Audios
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference

app.py ADDED Viewed

	@@ -0,0 +1,155 @@

+import torch
+import gradio as gr
+import librosa
+import tempfile
+from typing import Optional
+from TTS.config import load_config
+from transformers import AutoFeatureExtractor, AutoModelForSeq2SeqLM, AutoTokenizer, pipeline
+from TTS.utils.manage import ModelManager
+from TTS.utils.synthesizer import Synthesizer
+first_generation = True
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+def load_and_fix_data(input_file, model_sampling_rate):
+    speech, sample_rate = librosa.load(input_file)
+    if len(speech.shape) > 1:
+        speech = speech[:, 0] + speech[:, 1]
+    if sample_rate != model_sampling_rate:
+        speech = librosa.resample(speech, sample_rate, model_sampling_rate)
+    return speech
+feature_extractor = AutoFeatureExtractor.from_pretrained("jonatasgrosman/wav2vec2-xls-r-1b-spanish")
+sampling_rate = feature_extractor.sampling_rate
+asr = pipeline("automatic-speech-recognition", model="jonatasgrosman/wav2vec2-xls-r-1b-spanish")
+prefix = ''
+model_checkpoint = "hackathon-pln-es/es_text_neutralizer"
+tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
+manager = ModelManager()
+MODEL_NAMES = manager.list_tts_models()
+def postproc(input_sentence, preds):
+    try:
+        preds = preds.replace('De el', 'Del').replace('de el', 'del').replace('  ', ' ')
+        if preds[0].islower():
+            preds = preds.capitalize()
+        preds = preds.replace(' . ', '. ').replace(' , ', ', ')
+        # Nombres en mayusculas
+        prev_letter = ''
+        for word in input_sentence.split(' '):
+            if word:
+                if word[0].isupper():
+                    if word.lower() in preds and word != input_sentence.split(' ')[0]:
+                        if prev_letter == '.':
+                            preds = preds.replace('. ' + word.lower() + ' ', '. ' + word + ' ')
+                        else:
+                            if word[-1] == '.':
+                                preds = preds.replace(word.lower(), word)
+                            else:
+                                preds = preds.replace(word.lower() + ' ', word + ' ')
+                prev_letter = word[-1]
+        preds = preds.strip()  # quitar ultimo espacio
+    except:
+        pass
+    return preds
+model_name = "es/mai/tacotron2-DDC"
+MAX_TXT_LEN = 100
+def predict_and_ctc_lm_decode(input_file, speaker_idx: str=None):
+    speech = load_and_fix_data(input_file, sampling_rate)
+    transcribed_text = asr(speech, chunk_length_s=10, stride_length_s=1)
+    transcribed_text = transcribed_text["text"]
+    inputs = tokenizer([prefix + transcribed_text], return_tensors="pt", padding=True)
+    with torch.no_grad():
+        if first_generation:
+            output_sequence = model.generate(
+                input_ids=inputs["input_ids"].to(device),
+                attention_mask=inputs["attention_mask"].to(device),
+                do_sample=False,  # disable sampling to test if batching affects output
+            )
+        else:
+            output_sequence = model.generate(
+                input_ids=inputs["input_ids"].to(device),
+                attention_mask=inputs["attention_mask"].to(device),
+                do_sample=False,
+                num_beams=2,
+                repetition_penalty=2.5,
+                # length_penalty=1.0,
+                early_stopping=True# disable sampling to test if batching affects output
+            )
+    text = postproc(transcribed_text,
+                     preds=tokenizer.decode(output_sequence[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
+    if len(text) > MAX_TXT_LEN:
+        text = text[:MAX_TXT_LEN]
+        print(f"Input text was cutoff since it went over the {MAX_TXT_LEN} character limit.")
+    print(text, model_name)
+    # download model
+    model_path, config_path, model_item = manager.download_model(f"tts_models/{model_name}")
+    vocoder_name: Optional[str] = model_item["default_vocoder"]
+    # download vocoder
+    vocoder_path = None
+    vocoder_config_path = None
+    if vocoder_name is not None:
+        vocoder_path, vocoder_config_path, _ = manager.download_model(vocoder_name)
+    # init synthesizer
+    synthesizer = Synthesizer(
+        model_path, config_path, None, None, vocoder_path, vocoder_config_path,
+    )
+    # synthesize
+    if synthesizer is None:
+        raise NameError("model not found")
+    wavs = synthesizer.tts(text, speaker_idx)
+    # return output
+    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as fp:
+        synthesizer.save_wav(wavs, fp)
+        return fp.name
+description = """This is a Gradio demo for generating gender-neutralized audios. To use it, simply provide an audio input (via microphone or audio recording), which will then be transcribed and gender-neutralized using pre-trained models. Finally, with the help of Coqui's TTS model, gender neutralized audio is generated.
+Pre-trained model used for Spanish ASR: [jonatasgrosman/wav2vec2-xls-r-1b-spanish](https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-spanish)
+Pre-trained model used for Gender Neutralization: [hackathon-pln-es/es_text_neutralizer](https://huggingface.co/hackathon-pln-es/es_text_neutralizer)
+Pre-trained model used for TTS: 🐸💬 CoquiTTS => model_name = "es/mai/tacotron2-DDC"
+"""
+article = """ **ACKNOWLEDGEMENT:**
+**This project is based on the following Spaces:**
+[CoquiTTS](https://huggingface.co/spaces/coqui/CoquiTTS)
+[es_nlp_gender_neutralizer](https://huggingface.co/spaces/hackathon-pln-es/es_nlp_gender_neutralizer)
+[Hindi_ASR](https://huggingface.co/spaces/anuragshas/Hindi_ASR)
+"""
+gr.Interface(
+    predict_and_ctc_lm_decode,
+    inputs=[
+        gr.inputs.Audio(source="microphone", type="filepath", label="Record your audio")
+    ],
+    outputs=gr.outputs.Audio(label="Output"),
+    examples=[["Example1.wav"],["Example2.wav"],["Example3.wav"]],
+    title="Generate-Gender-Neutralized-Audios",
+    description = description,
+    article=article,
+    layout="horizontal",
+    theme="huggingface",
+).launch(enable_queue=True, cache_examples=True)

audio1.wav ADDED Viewed

Binary file (348 kB). View file

example2.wav ADDED Viewed

Binary file (68 kB). View file

example3.wav ADDED Viewed

Binary file (235 kB). View file

packages.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ libsndfile1
2	+ espeak-ng

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+transformers
+torch
+librosa==0.8.0
+pyctcdecode
+pypi-kenlm
+git+https://github.com/coqui-ai/TTS@dev#egg=TTS

travel.mp3 ADDED Viewed

Binary file (6.63 kB). View file

travel.wav ADDED Viewed

Binary file (48.5 kB). View file