Application through https://github.com/coqui-ai/TTS

#1
by UMCU - opened

I have tried to apply the model through https://github.com/coqui-ai/TTS but without success. What would you recommend for application, the Huggingface pipeline or TTS (and what is a minimal example for this)?

Many thanks in advance

We have a Python module that we use to integrate with our Neon AI Voice Assistant that maybe be helpful. The models are indexed in that module, so if you request a supported language then the model is automatically downloaded and used.

Linked to that project, we also build a Docker image that has a REST API and a Gradio UI you can use. You can check out a deployment of that container at coqui.neonaiservices.com.

Hope that helps and I'm happy to answer any other questions you have.

I will continue tinkering with this information, thanks :).

Can you confirm that the number of hidden channels for the text encoder is 192, because the model expects 196 channels :).

Neon AI org

@UMCU as you can see here hidden_channels=192
Coqui.TTS officially added supports of our models in release v0.9.0
A lot of things changed from that time (current stable is v0.22.0) so if something isn't working as expected use v0.9.0 or latest that is luckily working

Also you can address this issue to Coqui team
As we guaranty compatibility and optimal performance only with neon-tts-plugin-coqui with requires less RAM and dependencies to install


But if you want help with your specific setup provide a detailed info how to reproduce your problem
Because it's common practice to do so

Can't confirm that model expects 196 channels, you can figure out what you are doing wrong with our help by telling us what exactly you are doing

Thanks for the help Bohdan:

My script to run the model is simply:

MODEL_PATH = "/some/folder/tts_models/multilingual/multi-dataset/tts-vits-cv-ga"
CONFIG_PATH = "/some/folder/tts_models/multilingual/multi-dataset/tts-vits-cv-ga/config.json"

config = VitsConfig()
config.load_json(CONFIG_PATH)
ap = AudioProcessor.init_from_config(config)
tokenizer, config = TTSTokenizer.init_from_config(config)
model = Vits.init_from_config(config)
model.load_checkpoint(config, 
                      checkpoint_path=os.path.join(MODEL_PATH,"model_file.pth.tar"), 
                      eval=True, 
                      strict=False, 
                      cache=False)
model.ap=ap
model.tokenizer=tokenizer
model.cuda()

This loads fine. Then the following fails

wav, alignment, _, _ = synthesis(
    model,
    text,
    config,
    use_cuda=True
).values()

with the error :

RuntimeError: Given groups=1, weight of size [196, 196, 1], expected input[1, 192, 1865] to have 196 channels, but got 192 channels instead


I will revert to v0.9.0 in the meantime, to see if it works through the TTS api :).

I can at least state that using the code above it fails in the same way for TTS v0.9.0.

Continuing..

Neon AI org

Try this simple code sample

pip install neon-tts-plugin-coqui
from neon_tts_plugin_coqui import CoquiTTS

language = "ga"

coquiTTS = CoquiTTS()
coquiTTS.get_tts(sentence, output_file, speaker = {"language" : language})

I get the following:
```

KeyError Traceback (most recent call last)
Cell In[8], line 1
----> 1 _neonTTS = neonTTS(lang="ga")
2 result = _neonTTS.get_tts("Bhí loch ag mo sheanmháthair",
3 "../artifacts/test_irish.wav",
4 speaker={
5 "language": "ga"
6 })

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/neon_tts_plugin_coqui/init.py:69, in CoquiTTS.init(self, lang, config, *args, **kwargs)
67 self.cache_engines = self.config.get("cache", True)
68 if self.cache_engines:
---> 69 self._init_model({"lang": lang})

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/neon_tts_plugin_coqui/init.py:188, in CoquiTTS._init_model(self, speaker)
186 if lang not in self.engines:
187 LOG.info(f"Initializing model for: {lang}")
--> 188 synt = self._init_synthesizer(lang)
189 if self.cache_engines:
190 self.engines[lang] = synt

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/neon_tts_plugin_coqui/init.py:240, in CoquiTTS._init_synthesizer(self, lang)
237 model_path = self._download_huggingface(model_name)
239 importer = package.PackageImporter(model_path)
--> 240 synt = importer.load_pickle("tts_models", "model")
241 return synt

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:281, in PackageImporter.load_pickle(self, package, resource, map_location)
278 self.last_map_location = None
280 with set_deserialization_context():
--> 281 result = unpickler.load()
283 # TODO from zdevito:
284 # This stateful weird function will need to be removed in our efforts
285 # to unify the format. It has a race condition if multiple python
286 # threads try to read independent files
287 torch._utils._validate_loaded_sparse_tensors()

File /media/folder/folder2/folder/.pyenv/versions/3.10.14/lib/python3.10/pickle.py:1213, in _Unpickler.load(self)
1211 raise EOFError
1212 assert isinstance(key, bytes_types)
-> 1213 dispatchkey[0]
1214 except _Stop as stopinst:
1215 return stopinst.value

File /media/folder/folder2/folder/.pyenv/versions/3.10.14/lib/python3.10/pickle.py:1529, in _Unpickler.load_global(self)
1527 module = self.readline()[:-1].decode("utf-8")
1528 name = self.readline()[:-1].decode("utf-8")
-> 1529 klass = self.find_class(module, name)
1530 self.append(klass)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/_package_unpickler.py:25, in PackageUnpickler.find_class(self, module, name)
23 elif module in _compat_pickle.IMPORT_MAPPING:
24 module = _compat_pickle.IMPORT_MAPPING[module]
---> 25 mod = self._importer.import_module(module)
26 return getattr(mod, name)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:159, in PackageImporter.import_module(self, name, package)
150 # We should always be able to support importing modules from this package.
151 # This is to support something like:
152 # obj = importer.load_pickle(...)
(...)
155 # Note that _mangler.demangle will not demangle any module names
156 # produced by a different PackageImporter instance.
157 name = self._mangler.demangle(name)
--> 159 return self._gcd_import(name)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:516, in PackageImporter._gcd_import(self, name, package, level)
513 if level > 0:
514 name = _resolve_name(name, package, level)
--> 516 return self._find_and_load(name)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:486, in PackageImporter._find_and_load(self, name)
484 module = self.modules.get(name, _NEEDS_LOADING)
485 if module is _NEEDS_LOADING:
--> 486 return self._do_find_and_load(name)
488 if module is None:
489 message = f"import of {name} halted; None in sys.modules"

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:476, in PackageImporter._do_find_and_load(self, name)
473 msg = (_ERR_MSG + "; {!r} is not a package").format(name, parent)
474 raise ModuleNotFoundError(msg, name=name) from None
--> 476 module = self._load_module(name, parent)
478 self._install_on_parent(parent, name, module)
480 return module

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:399, in PackageImporter._load_module(self, name, parent)
397 module = self.modules[name] = importlib.import_module(name)
398 return module
--> 399 return self._make_module(name, cur.source_file, isinstance(cur, _PackageNode), parent)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:379, in PackageImporter._make_module(self, name, filename, is_package, parent)
376 linecache.lazycache(mangled_filename, ns)
378 code = self._compile_source(filename, mangled_filename)
--> 379 exec(code, ns)
381 return module

File .TTS/tts/models/vits.py:30
28 from TTS.tts.utils.synthesis import synthesis
29 from TTS.tts.utils.text.characters import BaseCharacters, _characters, _pad, _phonemes, _punctuations
---> 30 from TTS.tts.utils.text.tokenizer import TTSTokenizer
31 from TTS.tts.utils.visual import plot_alignment
32 from TTS.vocoder.models.hifigan_generator import HifiganGenerator

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:561, in PackageImporter.import(self, name, globals, locals, fromlist, level)
559 def import(self, name, globals=None, locals=None, fromlist=(), level=0):
560 if level == 0:
--> 561 module = self.gcd_import(name)
562 else:
563 globals
= globals if globals is not None else {}

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:516, in PackageImporter._gcd_import(self, name, package, level)
513 if level > 0:
514 name = _resolve_name(name, package, level)
--> 516 return self._find_and_load(name)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:486, in PackageImporter._find_and_load(self, name)
484 module = self.modules.get(name, _NEEDS_LOADING)
485 if module is _NEEDS_LOADING:
--> 486 return self._do_find_and_load(name)
488 if module is None:
489 message = f"import of {name} halted; None in sys.modules"

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:476, in PackageImporter._do_find_and_load(self, name)
473 msg = (_ERR_MSG + "; {!r} is not a package").format(name, parent)
474 raise ModuleNotFoundError(msg, name=name) from None
--> 476 module = self._load_module(name, parent)
478 self._install_on_parent(parent, name, module)
480 return module

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:399, in PackageImporter._load_module(self, name, parent)
397 module = self.modules[name] = importlib.import_module(name)
398 return module
--> 399 return self._make_module(name, cur.source_file, isinstance(cur, _PackageNode), parent)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:379, in PackageImporter._make_module(self, name, filename, is_package, parent)
376 linecache.lazycache(mangled_filename, ns)
378 code = self._compile_source(filename, mangled_filename)
--> 379 exec(code, ns)
381 return module

File .TTS/tts/utils/text/tokenizer.py:5
3 from TTS.tts.utils.text import cleaners
4 from TTS.tts.utils.text.characters import Graphemes, IPAPhonemes
----> 5 from TTS.tts.utils.text.phonemizers import DEF_LANG_TO_PHONEMIZER, get_phonemizer_by_name
6 from TTS.utils.generic_utils import get_import_path, import_class
9 class TTSTokenizer:

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:561, in PackageImporter.import(self, name, globals, locals, fromlist, level)
559 def import(self, name, globals=None, locals=None, fromlist=(), level=0):
560 if level == 0:
--> 561 module = self.gcd_import(name)
562 else:
563 globals
= globals if globals is not None else {}

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:516, in PackageImporter._gcd_import(self, name, package, level)
513 if level > 0:
514 name = _resolve_name(name, package, level)
--> 516 return self._find_and_load(name)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:486, in PackageImporter._find_and_load(self, name)
484 module = self.modules.get(name, _NEEDS_LOADING)
485 if module is _NEEDS_LOADING:
--> 486 return self._do_find_and_load(name)
488 if module is None:
489 message = f"import of {name} halted; None in sys.modules"

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:476, in PackageImporter._do_find_and_load(self, name)
473 msg = (_ERR_MSG + "; {!r} is not a package").format(name, parent)
474 raise ModuleNotFoundError(msg, name=name) from None
--> 476 module = self._load_module(name, parent)
478 self._install_on_parent(parent, name, module)
480 return module

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:399, in PackageImporter._load_module(self, name, parent)
397 module = self.modules[name] = importlib.import_module(name)
398 return module
--> 399 return self._make_module(name, cur.source_file, isinstance(cur, _PackageNode), parent)

File /media/folder/folder2/folder/VIRTUALENVS/Python/seanos-bFLQpzeS-py3.10/lib/python3.10/site-packages/torch/package/package_importer.py:379, in PackageImporter._make_module(self, name, filename, is_package, parent)
376 linecache.lazycache(mangled_filename, ns)
378 code = self._compile_source(filename, mangled_filename)
--> 379 exec(code, ns)
381 return module

File .TTS/tts/utils/text/phonemizers/init.py:20
17 DEF_LANG_TO_PHONEMIZER.update(_new_dict)
19 # Force default for some languages
---> 20 DEF_LANG_TO_PHONEMIZER["en"] = DEF_LANG_TO_PHONEMIZER["en-us"]
22 def get_phonemizer_by_name(name: str, **kwargs) -> BasePhonemizer:
23 """Initiate a phonemizer by name
24
25 Args:
(...)
30 Extra keyword arguments that should be passed to the phonemizer.
31 """

KeyError: 'en-us'
```

OK, got it to work, I forgot to install espeak-ng!

I want to continue training the model with Irish Sean Nos songs. Before I delve in, do you recommend any recipe to do this? :)

Neon AI org

Just the docs from coqui tts
here and here

Thanks from a nervous beginner! And sorry for the RTFM questions ;).

NeonBohdan changed discussion status to closed

Sign up or log in to comment