NeuML
/

ljspeech-jets-onnx

Model card Files Files and versions Community

ljspeech-jets-onnx / README.md

davidmezzetti's picture

Update README

0698529 24 days ago

|

history blame contribute delete

1.98 kB

	---
	tags:
	- audio
	- text-to-speech
	- onnx
	inference: false
	language: en
	datasets:
	- ljspeech
	license: apache-2.0
	library_name: txtai
	---

	# ESPnet JETS Text-to-Speech (TTS) Model for ONNX

	[imdanboy/jets](https://huggingface.co/imdanboy/ljspeech_tts_train_jets_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave) exported to ONNX. This model is an ONNX export using the [espnet_onnx](https://github.com/espnet/espnet_onnx) library.

	## Usage with txtai

	[txtai](https://github.com/neuml/txtai) has a built in Text to Speech (TTS) pipeline that makes using this model easy.

	```python
	import soundfile as sf

	from txtai.pipeline import TextToSpeech

	# Build pipeline
	tts = TextToSpeech("NeuML/ljspeech-jets-onnx")

	# Generate speech
	speech, rate = tts("Say something here")

	# Write to file
	sf.write("out.wav", speech, rate)
	```

	## Usage with ONNX

	This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with [ttstokenizer](https://github.com/neuml/ttstokenizer).

	Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.

	```python
	import onnxruntime
	import soundfile as sf
	import yaml

	from ttstokenizer import TTSTokenizer

	# This example assumes the files have been downloaded locally
	with open("ljspeech-jets-onnx/config.yaml", "r", encoding="utf-8") as f:
	config = yaml.safe_load(f)

	# Create model
	model = onnxruntime.InferenceSession(
	"ljspeech-jets-onnx/model.onnx",
	providers=["CPUExecutionProvider"]
	)

	# Create tokenizer
	tokenizer = TTSTokenizer(config["token"]["list"])

	# Tokenize inputs
	inputs = tokenizer("Say something here")

	# Generate speech
	outputs = model.run(None, {"text": inputs})

	# Write to file
	sf.write("out.wav", outputs[0], 22050)
	```

	## How to export

	More information on how to export ESPnet models to ONNX can be [found here](https://github.com/espnet/espnet_onnx#text2speech-inference).