|
--- |
|
license: cc-by-sa-4.0 |
|
language: |
|
- ja |
|
pipeline_tag: text-to-speech |
|
tags: |
|
- style-bert-vits2 |
|
- style-bert-vits2-jp-extra |
|
- tts |
|
- childish |
|
- childish voice |
|
- japanese |
|
- text2audio |
|
- text-to-audio |
|
- text to audio |
|
- audio |
|
--- |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629ba7d59854b02da014f64/i64Rx7UbX_-KPLA3uJEEO.png) |
|
|
|
# このモデルの長所は幼げなおっとりしたボイス生成を商用・非商用問わず無料で自由に使える点です。 |
|
|
|
# The advantage of this model is that you can freely use the childish and unapologetic voice generation for free, both commercial and non-commercial. |
|
|
|
このモデルはRikkaBotanのスイートバージョンです。 |
|
セリフの読み上げに適しています。 |
|
もしもっと硬く話してほしい場合は、[coolバージョン](https://huggingface.co/RikkaBotan/style_bert_vits2_jp_extra_cool_original)を試してみてください。 |
|
|
|
This model is sweet version. |
|
It is suitable for reading emotional text. |
|
If you want them to speak more descriptively, try the [cool version](https://huggingface.co/RikkaBotan/style_bert_vits2_jp_extra_cool_original). |
|
|
|
# モデルのサンプル音声/sample voice |
|
|
|
このモデルのサンプル音声①です |
|
|
|
|
|
このモデルのサンプル音声②です。 |
|
|
|
|
|
# モデルの説明/model description |
|
|
|
このモデルはTTS(text-to-speech)モデルである、 |
|
style_bert_vits2_jp_extraを独自の音声データで学習させたモデルです。 |
|
style_bert_vits2_jp_extraは日本語に特化した音声生成モデルであり、 |
|
これまでのモデルと比較して高精度かつ自然な音声生成が可能となっています。 |
|
学習データはモデルを作成した研究者本人の音声のみであるため、 |
|
ライセンスはstyle_bert_vits2_jp_extraと同様に |
|
商用・非商用問わず、自由に無料でご使用いただけます。 |
|
|
|
This model is a TTS (text-to-speech) model. |
|
This is a model that has trained style_bert_vits2_jp_extra with my own voice data. |
|
style_bert_vits2_jp_extra is a speech generation model specialized for Japanese. |
|
Compared to previous models, it is possible to generate highly accurate and natural speech. |
|
Since the training data is only the voice of the researcher who created the model, |
|
The license is the same as style_bert_vits2_jp_extra |
|
You can use it freely and free of charge, regardless of whether it is commercial or non-commercial. |
|
|
|
# モデルを使うときのお約束/limitation |
|
|
|
〇できること/What you can do |
|
|
|
成果物の加工 Processing of deliverables |
|
|
|
成果物の商用利用 Commercial use of deliverables |
|
|
|
成果物の学習素材としての利用 Use of deliverables as learning materials |
|
|
|
R-18、R-18G表現への利用(ただしゾーニングは必須です(小さなお友達のことをちゃんと考えてあげてね)) |
|
|
|
Use for R-18 and R-18G expressions (but zoning is required (please think about your little friends)) |
|
|
|
|
|
×できないこと/What you cannot do |
|
|
|
音声モデルの二次配布 Secondary distribution of voice models |
|
|
|
人を批判・攻撃すること Criticizing or attacking others |
|
|
|
特定の政治的立場・宗教・思想への賛同または反対を呼びかけること Calling for support or opposition to a particular political position, religion, or ideology |
|
|
|
刺激の強い表現をゾーニングなしで公開すること Publishing R-18 voice without zoning |
|
|
|
なりすましなど、提供者に不利益をもたらすこと detrimental to the provider |
|
|
|
|
|
# できればやって欲しいこと/If you like |
|
X(Twitter)や説明文でこのモデルを使ったことを書いてもらえると作者が喜びます。(必須ではありません) |
|
If you write that you are using this model, I will be glad! |
|
|
|
# モデルの使い方/how to use (コードはgoogle colab用です。 For google colab) |
|
|
|
2通りの使用方法があります。必要に応じて選択してください。There are 2 ways to use model. |
|
|
|
1.style-bert-vits2のアプリを使ってボイスを生成する/to use style-bert-vits2 app |
|
|
|
①Style-Bert-VITS2 インストール先の Style-Bert-VITS2/model_assets/rikka_botan/ フォルダに config.json, safetensors, style_vectors.npy の 3ファイルを置きます。 |
|
Put 3 files on Style-Bert-VITS2/model_assets/rikka_botan/ folder |
|
以下のプログラムで自動的に保存できます。By using this program, we can save files. |
|
```python |
|
from google.colab import drive |
|
drive.mount("/content/drive") |
|
%cd /content/drive/MyDrive/ |
|
!mkdir Style-Bert-VITS2/ |
|
%cd Style-Bert-VITS2/ |
|
!mkdir model_assets/ |
|
%cd model_assets/ |
|
!mkdir rikka_botan/ |
|
from huggingface_hub import snapshot_download |
|
|
|
model_name = "RikkaBotan/style_bert_vits2_jp_extra_sweet_original" |
|
download_path = snapshot_download( |
|
repo_id=model_name, |
|
local_dir = f"rikka_botan/", |
|
local_dir_use_symlinks=False |
|
) |
|
``` |
|
|
|
②以下のプログラムを実行します execute this program |
|
|
|
```python |
|
!git clone https://github.com/litagin02/Style-Bert-VITS2.git |
|
%cd Style-Bert-VITS2/ |
|
!pip install -r requirements.txt |
|
!python initialize.py --skip_jvnv |
|
|
|
from google.colab import drive |
|
drive.mount("/content/drive") |
|
|
|
dataset_root = "/content/drive/MyDrive/Style-Bert-VITS2/Data" |
|
assets_root = "/content/drive/MyDrive/Style-Bert-VITS2/model_assets" |
|
import yaml |
|
with open("configs/paths.yml", "w", encoding="utf-8") as f: |
|
yaml.dump({"dataset_root": dataset_root, "assets_root": assets_root}, f) |
|
|
|
!python app.py --share |
|
``` |
|
③public URLにアクセスします。access public url |
|
|
|
2.以下のコードを利用します。use this code |
|
|
|
```python |
|
# At first, we will install the required libraries |
|
!git clone https://github.com/litagin02/Style-Bert-VITS2.git |
|
%cd Style-Bert-VITS2/ |
|
!pip install -r requirements.txt |
|
!pip install style-bert-vits2 --no-build-isolation # To avoid bugs |
|
|
|
# load Japanese bert model |
|
from style_bert_vits2.nlp import bert_models |
|
from style_bert_vits2.constants import Languages |
|
|
|
bert_models.load_model(Languages.JP, "ku-nlp/deberta-v2-large-japanese-char-wwm") |
|
bert_models.load_tokenizer(Languages.JP, "ku-nlp/deberta-v2-large-japanese-char-wwm") |
|
|
|
# save model files to model_assets dir |
|
from pathlib import Path |
|
from huggingface_hub import hf_hub_download |
|
|
|
model_file = "rikka_botan_mokyumokyu.safetensors" |
|
config_file = "config.json" |
|
style_file = "style_vectors.npy" |
|
|
|
for file in [model_file, config_file, style_file]: |
|
print(file) |
|
hf_hub_download( |
|
"RikkaBotan/style_bert_vits2_jp_extra_sweet_original", |
|
file, |
|
local_dir="model_assets" |
|
) |
|
|
|
|
|
# By using saved model, we will test text-to-speech demo |
|
from style_bert_vits2.tts_model import TTSModel |
|
|
|
assets_root = Path("model_assets") |
|
|
|
model = TTSModel( |
|
model_path=assets_root / model_file, |
|
config_path=assets_root / config_file, |
|
style_vec_path=assets_root / style_file, |
|
device="cuda" # If you cannot use cuda, please input cpu |
|
) |
|
|
|
# Please input the Japanese text |
|
from IPython.display import Audio, display |
|
|
|
sr, audio = model.infer(text="ここに文章を入力してください") |
|
display(Audio(audio, rate=sr)) |
|
``` |
|
|
|
|
|
# 謝辞/Acknowledgments |
|
style-bert-vits2-jp-extraを開発してくださった[litagin](https://huggingface.co/litagin)さんに感謝いたします。 |
|
We would like to thank Mr./Ms. [litagin](https://huggingface.co/litagin) for developing style-bert-vits2-jp-extra |