microsoft/speecht5_tts · model sometimes repeats itself and glitches during speech.

It sounds cool, very "AI going insane" sort of thing, but I must be doing something wrong. In the audio below, the TTS module is explaining the difference between sheep and goats. Around 10 seconds in, the glitching starts. I'm wondering if this has to do with how I've set the model up? Here is the code I've used:

#setup
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts",max_new_tokens=256)
ttsmodel = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

# load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
...
inputs = processor(text=txt, return_tensors="pt",max_new_tokens=256)
speech = ttsmodel.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
...

#Process what is in the variable "txt", which has at most 30 characters: