xiaozhongabc
/

my-speecht5-tts

Model card Files Files and versions Community

my-speecht5-tts / README.md

xiaozhongabc's picture

Create README.md

40c713a verified 5 months ago

|

history blame contribute delete

1.14 kB

	---
	language:
	- en
	tags:
	- text-to-speech
	- tts
	- speech
	license: mit
	datasets:
	- Matthijs/cmu-arctic-xvectors
	---

	# SpeechT5 TTS

	This is a re-upload of the Microsoft/SpeechT5_TTS model.

	## Model description

	SpeechT5 is a unified-modal speech and text model developed by Microsoft. This specific model is fine-tuned for text-to-speech tasks.

	## Usage

	```python
	from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
	from datasets import load_dataset
	import torch
	import soundfile as sf

	processor = SpeechT5Processor.from_pretrained("YOUR_USERNAME/YOUR_REPO_NAME")
	model = SpeechT5ForTextToSpeech.from_pretrained("YOUR_USERNAME/YOUR_REPO_NAME")
	vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

	inputs = processor(text="Hello, how are you?", return_tensors="pt")

	embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
	speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)

	speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)

	sf.write("speech.wav", speech.numpy(), samplerate=16000)