Text-to-Speech
Transformers
Safetensors
English
parler_tts
text2text-generation
annotation

How to utilize the full memory of gpu for inference

#6
by code-me-running - opened

I want to do the inference on a long paragraph of text. I'm splitting it into sentences and doing the inference. Memory utilization of the model is up to 7 GB. I want to utilize the full memory and thereby increase and reduce the generation time. How to achieve this?

Sign up or log in to comment