Spaces:

Gregniuki
/

f5-tts_Polish_English_German

Running on Zero

Gregniuki commited on Nov 26, 2024

Commit

5863c7f

verified ·

1 Parent(s): 2b71199

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -182,10 +182,12 @@ def infer_batch(ref_audio, ref_text, gen_text_batches, exp_name, remove_silence,
         zh_pause_punc = r"。，、；：？！"
         ref_text_len = len(ref_text.encode('utf-8')) + 3 * len(re.findall(zh_pause_punc, ref_text))
         gen_text_len = len(gen_text.encode('utf-8')) + 3 * len(re.findall(zh_pause_punc, gen_text))
-        duration = ref_audio_len + int(ref_audio_len / ref_text_len * gen_text_len / speed)
-        print(f"Duration: {duration} seconds")
-        duration = min(5000, max(300, int(133 * gen_text_len / (speed * 10))))
-        print(f"Duration: {duration} seconds")
         # inference
         with torch.inference_mode():
@@ -738,9 +740,9 @@ This is a local web UI for F5 TTS with advanced batch processing support. This a
 The checkpoint support Polish English and German.
-Generations using CPU takes usually 2-3 minutes
-If you're having issues, try converting your reference audio to WAV or MP3, clipping it to 15s, and shortening your prompt.
 **NOTE: Reference text will be automatically transcribed with Whisper if not provided. For best results, keep your reference clips short (<15s). Ensure the audio is fully uploaded before generating.**
 """

         zh_pause_punc = r"。，、；：？！"
         ref_text_len = len(ref_text.encode('utf-8')) + 3 * len(re.findall(zh_pause_punc, ref_text))
         gen_text_len = len(gen_text.encode('utf-8')) + 3 * len(re.findall(zh_pause_punc, gen_text))
+        if len(ref_text) >= 1:
+            duration = ref_audio_len + int(ref_audio_len / ref_text_len * gen_text_len / speed)
+            print(f"Duration: {duration} seconds")
+        else:
+            duration = min(5000, max(300, int(133 * gen_text_len / (speed * 10))))
+            print(f"Duration: {duration} seconds")
         # inference
         with torch.inference_mode():
 The checkpoint support Polish English and German.
+Generations using CPU takes usually 2-3 minutes using 8 step inferece.
+If you're having issues, try converting your reference audio to WAV or MP3, clipping it to 5s, and shortening your prompt.
 **NOTE: Reference text will be automatically transcribed with Whisper if not provided. For best results, keep your reference clips short (<15s). Ensure the audio is fully uploaded before generating.**
 """