Spaces:
Running
on
Zero
Why did it became gibberish?
Why the synthesized voice is gibberish after 10-15 tries?? like it is combining the prompt i gave it previously
Hi, sorry about that! Would you mind attaching the reference audio you used, the text, and the generated audio?
Thanks!
Sorry for late reply, good thing is that it's fixed now somehow (maybe was just a glitch) however, I can provide you the reference audio and the text but i don't have the generated one cuz why would I even download the glitched one right 🤷♀️.
(There are moments in history when one person’s decision changes everything... When loyalty is tested... And the truth, is exposed... This, is the story of how a CIA agent betrayed the government... and revealed the deepest secrets of the U.S. surveillance state...)
However, I just wanna appreciate how good this thing works. Thank you for this service.
When I started using this TTS, it was wonderful but now it's no longer working at all. It's gibberish and garbage (13 tries). What happened?
@bigbrotherr would you mind attaching audio samples? Thanks!
Hey @MrFakeName , I'm experiencing the same type of issue.
I’m using the same audio and text that worked before, but after not using it since 1/10/25, the output is now just gibberish.
Example:
Input:
Output:
"Stocks slip as selling pressure continues on Wall Street"
System Info:
Torch: 2.6.0+cu124, CUDA Available: True
torch.version.cuda: 12.4
TorchAudio: 2.6.0+cu124
CUDA Path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\bin\nvcc.exe
CUDA Compilation Tools: release 12.4, V12.4.131
Driver Version: 566.36, CUDA Runtime Version: 12.7
GPU: NVIDIA GeForce GTX 1660 Ti (6GB VRAM)
GPU Utilization: 8%
I’ve tested both methods for running the TTS generation and still end up with gibberish output. Here's how I'm doing it:
1. CLI with TOML Configuration File:
f5-tts_infer-cli -c F5.toml
F5.toml configuration
# F5-TTS | E2-TTS Configuration File
model = "F5-TTS"
remove_silence = false
output_dir = "output"
output_file = "output.wav"
gen_file = "output\\output.txt"
[voices.Kevin]
ref_audio = "wavs\\15-Kevin.wav"
ref_text = "Canadians are in the British parliamentary system, so I have a very short fuse here. The Liberals, of which Trudeau was the leader, had a majority mandate a decade ago. They're about to get wiped out, and they're going to have sixty days to find a new leader."
2. Using a Python Script to Generate the Audio:
def generate_voiceover(text, output_dir="voiceover_output"):
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Get path to reference
script_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ref_audio_path = os.path.join(script_dir, "src", "ref.wav")
ref_text_path = os.path.join(script_dir, "src", "ref.txt")
if not os.path.exists(ref_audio_path):
raise FileNotFoundError(f"Reference audio file not found: {ref_audio_path}")
output_file = "output.wav"
output_path = os.path.join(output_dir, output_file)
cmd = [
'f5-tts_infer-cli',
'-m', 'F5-TTS',
'-r', ref_audio_path,
'-s', ref_text_path,
'-t', text,
'-o', output_dir,
'-w', output_file,
'--vocoder_name', 'bigvgan'
]
print(f"Running: {subprocess.list2cmdline(cmd)}")
try:
result = subprocess.run(cmd, capture_output=True, text=True)
if result.stdout:
print(result.stdout)
if result.stderr:
print(f"Error: {result.stderr}")
if result.returncode != 0:
print(f"Command failed with exit code {result.returncode}")
except Exception as e:
print(f"Exception during TTS generation: {e}")
# Verify audio was created
if not os.path.exists(output_path):
print("Warning: Output audio file not created!")
else:
print(f"Audio generated successfully: {output_path}")
return output_path
What I've Tried:
- Generating the reference text automatically
- Providing the reference text manually
- Using three different reference audios
- Reinstalling F5-TTS from scratch in a new folder
- Vocos and BigVGAN as vocoder
No matter what I do, the output is just gibberish—it either sounds like a different language or maybe multiple prompts are being combined.
Did something change? Is there something I should check?
Hi, just to confirm you are running the model locally (not on the demo)? If so, did you redownload the F5 repo on 1/10?
Hi, thanks for following up! I just wanted to apologize for posting here instead of on GitHub—I find GitHub a bit overwhelming, and I’m still trying to fully understand how to navigate it. I found this post about my issue, so I posted here.
To clarify, I am running the model locally (not on the demo). I originally downloaded the F5 repository in November/December 2024. Windows shows the folder's 'last used' date as Jan 10, 2024, but I did not update or redownload the model until March 10, 2025. On March 10, I created a new folder and attempted to get it working again, but no matter what I do—even when reverting to what I know was working—it outputs gibberish.
I also tried using the demo, and it works perfectly with my input WAV file. So it seems the issue is related to my local setup rather than my input files.
Hi,
No worries about reporting the bug here - just a note: I'm not the author of F5-TTS itself, just the demo.
Could you try re-cloning the repo? In a fresh directory + venv/env:
pip install git+https://github.com/SWivid/F5-TTS
cd F5-TTS
f5-tts_infer-cli
Thanks!