Sounds like you're here for something, eh? Alright then, please allow me to introduce about this piece of "scientific art", shall we?

Or perhaps, for those who are seeing not the first time here? Well still - welcome aboard!

About "RVC Models Collection Series"

TAS Tomusan's RVC Models Collection Series offers you with the (nearly) best quality voice models to work with your projects (e.g. video games [either start from scratch or using mods], movies and animation, etc.) at any time and anywhere. And they're always free to use, forever (Sora The Troll (YouTuber) reference)!

The very first model was created sometime in July 2023 (somewhat a "prototype"), and supposedly to be released for the first time in my own work, but that didn't happened. That's due to my strict quality checks and ensuring there's no issues or compromises (and complaints) when anyone uses any of the released voice models to the public.

Hence, I strongly believe that this feature (RVC and other voice cloning tools) won't ever stop developing and producing in the near future, as this is getting through the technological mainstream elsewhere in the world. And that's because anyone can do the (almost) impossible things (along with the right tools, right knowledge and investments like this), especially in this generation - in the right time and in the right place, probably. Using AI is "quite in the grey / gray area" in these time, but nevertheless, please enjoy (and have fun) using the voice models as much as you like!

My full method demonstration (or tutorial) here: https://www.youtube.com/watch?v=_Zl1FQTxkhs.


  1. How do we find your uploaded voice models?


Very easy.

Go to "Files and versions", and then you'll find various voice models I had made in the past days (or months, or even years). Pick one (or many if you like), and then download at your own risk.

  1. Upon downloading voice model(s), what are the contents based from your own work?


As follows: (inside the contents of "(Full voice model name).zip" - usually quite bigger file size due to the following)

  1. "Reference" image of the designated voice model

  2. 5 sample audios (all are in .wav files)

  3. (Voice model name - usually no other languages except English obviously) AI voice data.zip - should be extracted too

    A .pth file and an .index file

This is somewhat (truly) a unique style of from others. Yes, it might be complex to you for the first time (to understand and memorize), but you're about to master the sooner or later that it will be the default arrangement.

  1. What are your software programs for working this models pack?


There are few or some programs that I used overall:

  1. Google Colab (not Gradio-UI that is currently restricted for free users) - for training models only and nothing else.
  2. RVC WebUI (0813) - for "model inference" and QA / QC (quality assurance / quality control or checks) before they're finalized.
  3. Audacity Cross-Platform (64-bit on Windows) - for the usual audio editing and making the "audio datasets" for the planned voice models.
  4. "Unspecified video-to-audio" converter (whether online or on local machine)

And I think that's all for this.

  1. How long (duration) did you worked on (a) voice model(s)?


It totally depends, including on whether it's going to be easy or very hard to find the proper sources found elsewhere in the Internet. Also the editing factors, whether again, ranging from very easy to very hard difficulty. Based from my experience, my works being mostly done within a day (1 up to 3 maximum of voice model uploads estimated), if I'm in the good mood (and confidently eager to finish within certain timeframe of my own).

  1. Wait, why is "Instructions_Directions.txt" exists in this project? Some or most users doesn't seem to care on that file at all, but exactly what's that all about?


Simply put, it is for the directing / guiding usage of the uploaded (and finalized) voice models. It is entirely optional to read that file as well, but for some people who wanted to read (and learn) more (no spoilers here) until through the bottommost part, then truly I recommend to read it first beforehand. You/they may download the mentioned file as well on their own without compromises (it's optional, but you can read here on HuggingFace if you don't need to download for yourself).

  1. And how about the "Reference" photo file in all voice models you'd uploaded?


That is to ensure the exactly portrayed being accurate (and precise) whatever they played for their roles. Thus, people will have no doubts "unless there are".

Upon downloading voice model(s), it's clearly optional if you/they want to keep it, you/they may delete it. But for quality check reasons (not to get confused on whose voice(s) came from) I would recommend to keep it - it's just a single image file as is.

  1. What are these "sample / sample audio (language)" found in a voice model?


That's to demonstrate the powerful capabilities of the AI voice model(s), and proving not only 1 language is provided, but also 2 or even more (or let's just say "all of the above - including the so-called language of the gods") languages at your own disposal.

Here are the 5 sample demos to listen:





(and) Tagalog / Filipino

In that case, only 5 languages (of my own choice) are enough to actually consider. More than that is just too much to add, I guess. Also, after extracting from your voice model and hearing all 5 samples one-by-one, it's up to you whether you'll keep them or delete them.

  1. Do you see those "pickle" files? What does that mean?


They're for serialization uses for ensuring they're safe (Punjabi) and no virus detected. Otherwise, those files will be marked as "Unsafe" (and should be deleted if found any) - you can read more here: https://huggingface.co/docs/hub/security-malware.

Likewise, there's nothing to panic (or to be afraid of) on what you/they should do with having "pickle" files. It's kind of random to see those when I upload them here, but probably I don't even care (at all) or even you too.

  1. What version of RVC did you use for training the models (and its "feature extraction")?


All of the uploaded voice models are powered by RVC v2 (version 2) and rmvpe (regardless whether it's standard CPU or using GPU).

  1. In fairness, why are you doing this kind of "human innovation"? Do you have any specific goals alongside many similar authors to this?


I have to say this to you, as based from my own perspective - I did this (RVC models pack) for creatively fun (experimental) purposes. Not only that, but also for the "long-term" preservation democratically, as an archivist (including as being a member of Internet Archive) myself. Well indeed, due to natural occurrences in our lives so to speak, our voices change as time passes (and usually depends on our lifestyles) and that tells this analogy, based from my own interpretation:

  • No vices (with exercise) = very minimal risks of voice loss
  • Few to some vices = quite minimal to few risks
  • Many vices intake = obvious major risks

But that doesn't mean it would always guarantee for sure, there'll be always have changes (or prove that being stableness exists from you). Or, due to "genetical" reasons, it would be just randomized happenings.

I don't want to spoil too much about everything else, so yeah, I think that from what I'd said earlier - to read the "Instructions_Directions.txt" fully until the bottommost part, if you know what am I talking about (and giving supporting answers for the given question above).

In case HuggingFace might face serious lawsuits (and its controversies) in the future due to AI-related content and whatever they're doing (hopefully not, oh dear Christ), I have additional plan(s) ahead to move along - uploading my own works in Internet Archive, in the future (can't even tell exactly when, but I'll do my very best as is), alternately as it's called "download mirrors" to avoid being lost-in-time.

Sure, they can give out harsh criticisms (destructive and false constructives) at several to many occassions like this, but I'm just doing my job (at its best) for real, and I'm just enjoying it whatever am I doing in life. Seriously, I don't have any regrets doing this in the long run that I was really excited to this kind of technological advancements - just moving forward and onward.

  1. Do / did you have any "scheduled" uploads of your created voice models?


Truthfully to tell - I don't have any "scheduled" plans, since that would entirely depend on my current mood (or the full confidence) - as based from Question #4.

  1. Do you believe that during your creation of voice model(s), the term called "perfect" voice model exists?


Well, I don't believe there's (or no such thing as) "perfection" on making voice models, whether be from mine or from others. In fact, they'll never be perfect in any sense (and no matter how many times you worked so hard on this particular feature), due to how the AI sounds worked out realistically whereas there'll be quite minor vocal glitches when trying to "inference" from one or many audio sources (by possible random chances). That's even when you use rmvpe feature extraction algorithm - there'll be petty artifacts and whatever during your own hearing tests.

Although to be honest, they're quite to totally close to perfect quality, but not too perfect at all.

  1. Does that (relating to question #12) mean, the term "immortality" exists?


Yes, "immortality" exists but in the sense of vocals. Likewise, the created voice models are finalized without any signs of "ageing", from time to time (even for many, many years to come). That's because literally, they're fully digitized in any way, unless the only option to cease its "immortality" is just - simply delete the original source(s) by the uploader / author.

Also to point out, picking any of the given voice models of your choice (whether be mine, or from someone / somewhere else), is/are only suitable for "playing their character", also unless, the AI is somewhat self-aware strangely (otherwise known as "being able to break character"). You know, AI is extremely complex to make and adjust, just as how humans (and its natural habitats) interact in the real world.

  1. Can these voice models be copyrighted (even in the near future)?


They cannot be copyrighted, as like in AI art and other related stuff - since they're in the form of "generative content".

On the other hand, it's absolutely fine if you would credit me fully so that people may know its true source / origin of those uploaded models. I would be truly appreciate it if you or someone did the proper crediting, so it's all up to you.

  1. Can we make request(s) or ask more questions, if possible?


Well, why not? I'll do my best as possible if there's enough time for me to do this so (aka "I shall see what I can do"). If you have any queries or other stuff in your mind to tell, please don't hesitate (and feel free) to contact me via email: thomasandresaldana@gmail.com

I also might update this FAQs as well as the others in this project, if there's any.

  1. Finally, is there anything else we would like to know more about you?


For those who don't know me yet, I'm currently a content creator not only in YouTube (since June 2017), but in other platforms as well in the following:




Nicovideo Japan



And yeah, this is my YouTube channel by the way.

Otherwise, anyone can visit my Linktree page here without hassles.

Thanks for reading this README.md until the end. If this has been helpful to you, then don't forget to please leave a like at the top; again it's all up to you.

Take care and have a great day!

Date created:

December 2, 2023 (9:50 PM - Dubai/Oman time) = Set as "Private" for a while

December 29, 2023 (12:45 PM - Dubai/Oman time) = Officially released

Last updated: December 29, 2023 (1:52 PM - Dubai/Oman time)

