{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "oxl2bna8ztl2"
},
"source": [
"**2022/08/12 更新和汉化,排除tensorflow版本问题,使用 https://github.com/CjangCjengh/tacotron2-japanese 库中的cleaner省去罗马音环节**\n",
"**2022/08/12 Updates and use Chinese instead. Tensorflow version problem solved.**\n",
"**Use cleaners in repo https://github.com/CjangCjengh/tacotron2-japanese to auto-generate roman words.**\n",
"
\n",
"\n",
"**Updated 2022/03/14 and the unpickling error is solved. The training part works as of 2022/03/14**\n",
"\n",
"**2022/03/15 Speech synsthesis with HiFi-GAN works** \n",
"\n",
"**2022/03/16 Speech synsthesis with Waveglow should work again now (tested)** \n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"source": [
"**Tacotron 2 Training and Synthesis Notebook**\n",
"originally based on the following notebooks\n",
"https://github.com/NVIDIA/tacotron2,\n",
"https://bit.ly/3F4DkH2\n",
"and those presented in Adam is cool and stuff (https://youtu.be/LQAOCXdU8p8 and https://youtu.be/XLt_K_692Mc)\n",
"\n",
"
\n",
"\n",
"**Tacotron 2 训练和语音合成笔记本**\n",
"基于\n",
"https://github.com/NVIDIA/tacotron2,\n",
"https://bit.ly/3F4DkH2,\n",
"https://colab.research.google.com/drive/1VAuIqEAnrmCig3Edt5zFgQdckY9TDi3N\n",
"\n",
"还要感谢Adam is cool and stuff频道的视频 (https://youtu.be/LQAOCXdU8p8 and https://youtu.be/XLt_K_692Mc)\n",
"\n",
"感谢CjangCjengh的cleaner支持和视频灵感 (https://www.bilibili.com/video/BV1rV4y177Z7)"
],
"metadata": {
"id": "2CuJ5rIAIv1H"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "M5rkhiBCXbMY"
},
"source": [
"**使用LJ Speech数据集预先训练权重的WaveGlow模型可以从这个地址下载:https://catalog.ngc.nvidia.com/orgs/nvidia/models/waveglow_ljs_256channels**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r2D2Mt80bamF"
},
"source": [
"# 准备数据 "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2022-08-12T04:13:05.594388Z",
"iopub.status.busy": "2022-08-12T04:13:05.593773Z",
"iopub.status.idle": "2022-08-12T04:13:11.760149Z",
"shell.execute_reply": "2022-08-12T04:13:11.758823Z",
"shell.execute_reply.started": "2022-08-12T04:13:05.594273Z"
},
"id": "67u1nnaJcyPt",
"trusted": true
},
"outputs": [],
"source": [
"#@title 下载 Tacotron 2\n",
"!git clone https://github.com/CjangCjengh/tacotron2-japanese tacotron2\n",
"!git submodule init\n",
"!git submodule update"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-12T04:13:20.371238Z",
"iopub.status.busy": "2022-08-12T04:13:20.370604Z",
"iopub.status.idle": "2022-08-12T04:18:18.664088Z",
"shell.execute_reply": "2022-08-12T04:18:18.662898Z",
"shell.execute_reply.started": "2022-08-12T04:13:20.371200Z"
},
"trusted": true,
"cellView": "form",
"id": "ujgAmihAIZg1"
},
"outputs": [],
"source": [
"#@title 安装依赖\n",
"!pip install -U tensorflow==1.15.2\n",
"!pip install -q unidecode tensorboardX\n",
"!pip install librosa==0.8.0\n",
"!pip install pysoundfile==0.9.0.post1\n",
"!pip install unidecode==1.3.4\n",
"!pip install pyopenjtalk==0.2.0\n",
"!pip install inflect==5.6.2\n",
"!pip install janome==0.4.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"trusted": true,
"cellView": "form",
"id": "TAouEAY7IZg7"
},
"outputs": [],
"source": [
"#@title 加载Google云端硬盘\n",
"from google.colab import drive\n",
"drive.mount('drive')"
]
},
{
"cell_type": "code",
"source": [
"#@title 创建文件夹和下载预训练模型\n",
"import os\n",
"if os.getcwd() != '/content/tacotron2':\n",
" os.chdir('/content/tacotron2')\n",
"! gdown --id 1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA\n",
"if not os.path.isdir(\"wavs\"):\n",
" os.mkdir('wavs')\n",
"if not os.path.isdir(\"outdir\"):\n",
" os.mkdir(\"outdir\")"
],
"metadata": {
"cellView": "form",
"id": "-tH9g-WrJ0Ro"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "pMt_D2p7c7Dl"
},
"source": [
"### 上传数据\n",
"\n",
"`text file` 是音频文件列表\n",
"\n",
"`audio files` 是音频文件"
]
},
{
"cell_type": "markdown",
"source": [
"![Tacotron2InstructionImages.jpg]()"
],
"metadata": {
"id": "6tKrpYzpg8t0"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "IOETvfJdbfYO"
},
"source": [
"# 准备模型"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-12T04:22:26.512807Z",
"iopub.status.busy": "2022-08-12T04:22:26.512405Z",
"iopub.status.idle": "2022-08-12T04:22:48.851514Z",
"shell.execute_reply": "2022-08-12T04:22:48.850448Z",
"shell.execute_reply.started": "2022-08-12T04:22:26.512767Z"
},
"trusted": true,
"cellView": "form",
"id": "VKFIe_5pIZg9"
},
"outputs": [],
"source": [
"#@title 训练模型的代码\n",
"%matplotlib inline\n",
"\n",
"import time\n",
"import argparse\n",
"import math\n",
"from numpy import finfo\n",
"\n",
"import torch\n",
"from distributed import apply_gradient_allreduce\n",
"import torch.distributed as dist\n",
"from torch.utils.data.distributed import DistributedSampler\n",
"from torch.utils.data import DataLoader\n",
"\n",
"from model import Tacotron2\n",
"from data_utils import TextMelLoader, TextMelCollate\n",
"from loss_function import Tacotron2Loss\n",
"from logger import Tacotron2Logger\n",
"from hparams import create_hparams\n",
" \n",
"import random\n",
"import numpy as np\n",
"\n",
"import layers\n",
"from utils import load_wav_to_torch, load_filepaths_and_text\n",
"from text import text_to_sequence\n",
"from math import e\n",
"#from tqdm import tqdm # Terminal\n",
"#from tqdm import tqdm_notebook as tqdm # Legacy Notebook TQDM\n",
"from tqdm.notebook import tqdm # Modern Notebook TQDM\n",
"from distutils.dir_util import copy_tree\n",
"import matplotlib.pylab as plt\n",
"\n",
"def download_from_google_drive(file_id, file_name):\n",
" # download a file from the Google Drive link\n",
" !rm -f ./cookie\n",
" !curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id={file_id}\" > /dev/null\n",
" confirm_text = !awk '/download/ {print $NF}' ./cookie\n",
" confirm_text = confirm_text[0]\n",
" !curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm={confirm_text}&id={file_id}\" -o {file_name}\n",
"\n",
"def create_mels():\n",
" print(\"Generating Mels\")\n",
" stft = layers.TacotronSTFT(\n",
" hparams.filter_length, hparams.hop_length, hparams.win_length,\n",
" hparams.n_mel_channels, hparams.sampling_rate, hparams.mel_fmin,\n",
" hparams.mel_fmax)\n",
" def save_mel(filename):\n",
" audio, sampling_rate = load_wav_to_torch(filename)\n",
" if sampling_rate != stft.sampling_rate:\n",
" raise ValueError(\"{} {} SR doesn't match target {} SR\".format(filename, \n",
" sampling_rate, stft.sampling_rate))\n",
" audio_norm = audio / hparams.max_wav_value\n",
" audio_norm = audio_norm.unsqueeze(0)\n",
" audio_norm = torch.autograd.Variable(audio_norm, requires_grad=False)\n",
" melspec = stft.mel_spectrogram(audio_norm)\n",
" melspec = torch.squeeze(melspec, 0).cpu().numpy()\n",
" np.save(filename.replace('.wav', ''), melspec)\n",
"\n",
" import glob\n",
" wavs = glob.glob('wavs/*.wav')\n",
" for i in tqdm(wavs):\n",
" save_mel(i)\n",
"\n",
"\n",
"def reduce_tensor(tensor, n_gpus):\n",
" rt = tensor.clone()\n",
" dist.all_reduce(rt, op=dist.reduce_op.SUM)\n",
" rt /= n_gpus\n",
" return rt\n",
"\n",
"\n",
"def init_distributed(hparams, n_gpus, rank, group_name):\n",
" assert torch.cuda.is_available(), \"Distributed mode requires CUDA.\"\n",
" print(\"Initializing Distributed\")\n",
"\n",
" # Set cuda device so everything is done on the right GPU.\n",
" torch.cuda.set_device(rank % torch.cuda.device_count())\n",
"\n",
" # Initialize distributed communication\n",
" dist.init_process_group(\n",
" backend=hparams.dist_backend, init_method=hparams.dist_url,\n",
" world_size=n_gpus, rank=rank, group_name=group_name)\n",
"\n",
" print(\"Done initializing distributed\")\n",
"\n",
"\n",
"def prepare_dataloaders(hparams):\n",
" # Get data, data loaders and collate function ready\n",
" trainset = TextMelLoader(hparams.training_files, hparams)\n",
" valset = TextMelLoader(hparams.validation_files, hparams)\n",
" collate_fn = TextMelCollate(hparams.n_frames_per_step)\n",
"\n",
" if hparams.distributed_run:\n",
" train_sampler = DistributedSampler(trainset)\n",
" shuffle = False\n",
" else:\n",
" train_sampler = None\n",
" shuffle = True\n",
"\n",
" train_loader = DataLoader(trainset, num_workers=1, shuffle=shuffle,\n",
" sampler=train_sampler,\n",
" batch_size=hparams.batch_size, pin_memory=False,\n",
" drop_last=True, collate_fn=collate_fn)\n",
" return train_loader, valset, collate_fn\n",
"\n",
"\n",
"def prepare_directories_and_logger(output_directory, log_directory, rank):\n",
" if rank == 0:\n",
" if not os.path.isdir(output_directory):\n",
" os.makedirs(output_directory)\n",
" os.chmod(output_directory, 0o775)\n",
" logger = Tacotron2Logger(os.path.join(output_directory, log_directory))\n",
" else:\n",
" logger = None\n",
" return logger\n",
"\n",
"\n",
"def load_model(hparams):\n",
" model = Tacotron2(hparams).cuda()\n",
" if hparams.fp16_run:\n",
" model.decoder.attention_layer.score_mask_value = finfo('float16').min\n",
"\n",
" if hparams.distributed_run:\n",
" model = apply_gradient_allreduce(model)\n",
"\n",
" return model\n",
"\n",
"\n",
"def warm_start_model(checkpoint_path, model, ignore_layers):\n",
" assert os.path.isfile(checkpoint_path)\n",
" print(\"Warm starting model from checkpoint '{}'\".format(checkpoint_path))\n",
" checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')\n",
" model_dict = checkpoint_dict['state_dict']\n",
" if len(ignore_layers) > 0:\n",
" model_dict = {k: v for k, v in model_dict.items()\n",
" if k not in ignore_layers}\n",
" dummy_dict = model.state_dict()\n",
" dummy_dict.update(model_dict)\n",
" model_dict = dummy_dict\n",
" model.load_state_dict(model_dict)\n",
" return model\n",
"\n",
"\n",
"def load_checkpoint(checkpoint_path, model, optimizer):\n",
" assert os.path.isfile(checkpoint_path)\n",
" print(\"Loading checkpoint '{}'\".format(checkpoint_path))\n",
" checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')\n",
" model.load_state_dict(checkpoint_dict['state_dict'])\n",
" optimizer.load_state_dict(checkpoint_dict['optimizer'])\n",
" learning_rate = checkpoint_dict['learning_rate']\n",
" iteration = checkpoint_dict['iteration']\n",
" print(\"Loaded checkpoint '{}' from iteration {}\" .format(\n",
" checkpoint_path, iteration))\n",
" return model, optimizer, learning_rate, iteration\n",
"\n",
"\n",
"def save_checkpoint(model, optimizer, learning_rate, iteration, filepath):\n",
" print(\"Saving model and optimizer state at iteration {} to {}\".format(\n",
" iteration, filepath))\n",
" try:\n",
" torch.save({'iteration': iteration,\n",
" 'state_dict': model.state_dict(),\n",
" 'optimizer': optimizer.state_dict(),\n",
" 'learning_rate': learning_rate}, filepath)\n",
" except KeyboardInterrupt:\n",
" print(\"interrupt received while saving, waiting for save to complete.\")\n",
" torch.save({'iteration': iteration,'state_dict': model.state_dict(),'optimizer': optimizer.state_dict(),'learning_rate': learning_rate}, filepath)\n",
" print(\"Model Saved\")\n",
"\n",
"def plot_alignment(alignment, info=None):\n",
" %matplotlib inline\n",
" fig, ax = plt.subplots(figsize=(int(alignment_graph_width/100), int(alignment_graph_height/100)))\n",
" im = ax.imshow(alignment, cmap='inferno', aspect='auto', origin='lower',\n",
" interpolation='none')\n",
" ax.autoscale(enable=True, axis=\"y\", tight=True)\n",
" fig.colorbar(im, ax=ax)\n",
" xlabel = 'Decoder timestep'\n",
" if info is not None:\n",
" xlabel += '\\n\\n' + info\n",
" plt.xlabel(xlabel)\n",
" plt.ylabel('Encoder timestep')\n",
" plt.tight_layout()\n",
" fig.canvas.draw()\n",
" plt.show()\n",
"\n",
"def validate(model, criterion, valset, iteration, batch_size, n_gpus,\n",
" collate_fn, logger, distributed_run, rank, epoch, start_eposh, learning_rate):\n",
" \"\"\"Handles all the validation scoring and printing\"\"\"\n",
" model.eval()\n",
" with torch.no_grad():\n",
" val_sampler = DistributedSampler(valset) if distributed_run else None\n",
" val_loader = DataLoader(valset, sampler=val_sampler, num_workers=1,\n",
" shuffle=False, batch_size=batch_size,\n",
" pin_memory=False, collate_fn=collate_fn)\n",
"\n",
" val_loss = 0.0\n",
" for i, batch in enumerate(val_loader):\n",
" x, y = model.parse_batch(batch)\n",
" y_pred = model(x)\n",
" loss = criterion(y_pred, y)\n",
" if distributed_run:\n",
" reduced_val_loss = reduce_tensor(loss.data, n_gpus).item()\n",
" else:\n",
" reduced_val_loss = loss.item()\n",
" val_loss += reduced_val_loss\n",
" val_loss = val_loss / (i + 1)\n",
"\n",
" model.train()\n",
" if rank == 0:\n",
" print(\"Epoch: {} Validation loss {}: {:9f} Time: {:.1f}m LR: {:.6f}\".format(epoch, iteration, val_loss,(time.perf_counter()-start_eposh)/60, learning_rate))\n",
" logger.log_validation(val_loss, model, y, y_pred, iteration)\n",
" if hparams.show_alignments:\n",
" %matplotlib inline\n",
" _, mel_outputs, gate_outputs, alignments = y_pred\n",
" idx = random.randint(0, alignments.size(0) - 1)\n",
" plot_alignment(alignments[idx].data.cpu().numpy().T)\n",
"\n",
"def train(output_directory, log_directory, checkpoint_path, warm_start, n_gpus,\n",
" rank, group_name, hparams, log_directory2):\n",
" \"\"\"Training and validation logging results to tensorboard and stdout\n",
"\n",
" Params\n",
" ------\n",
" output_directory (string): directory to save checkpoints\n",
" log_directory (string) directory to save tensorboard logs\n",
" checkpoint_path(string): checkpoint path\n",
" n_gpus (int): number of gpus\n",
" rank (int): rank of current gpu\n",
" hparams (object): comma separated list of \"name=value\" pairs.\n",
" \"\"\"\n",
" if hparams.distributed_run:\n",
" init_distributed(hparams, n_gpus, rank, group_name)\n",
"\n",
" torch.manual_seed(hparams.seed)\n",
" torch.cuda.manual_seed(hparams.seed)\n",
"\n",
" model = load_model(hparams)\n",
" learning_rate = hparams.learning_rate\n",
" optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate,\n",
" weight_decay=hparams.weight_decay)\n",
"\n",
" if hparams.fp16_run:\n",
" from apex import amp\n",
" model, optimizer = amp.initialize(\n",
" model, optimizer, opt_level='O2')\n",
"\n",
" if hparams.distributed_run:\n",
" model = apply_gradient_allreduce(model)\n",
"\n",
" criterion = Tacotron2Loss()\n",
"\n",
" logger = prepare_directories_and_logger(\n",
" output_directory, log_directory, rank)\n",
"\n",
" train_loader, valset, collate_fn = prepare_dataloaders(hparams)\n",
"\n",
" # Load checkpoint if one exists\n",
" iteration = 0\n",
" epoch_offset = 0\n",
" if checkpoint_path is not None and os.path.isfile(checkpoint_path):\n",
" if warm_start:\n",
" model = warm_start_model(\n",
" checkpoint_path, model, hparams.ignore_layers)\n",
" else:\n",
" model, optimizer, _learning_rate, iteration = load_checkpoint(\n",
" checkpoint_path, model, optimizer)\n",
" if hparams.use_saved_learning_rate:\n",
" learning_rate = _learning_rate\n",
" iteration += 1 # next iteration is iteration + 1\n",
" epoch_offset = max(0, int(iteration / len(train_loader)))\n",
" else:\n",
" os.path.isfile(\"tacotron2_statedict.pt\")\n",
" model = warm_start_model(\"tacotron2_statedict.pt\", model, hparams.ignore_layers)\n",
" # download LJSpeech pretrained model if no checkpoint already exists\n",
" \n",
" start_eposh = time.perf_counter()\n",
" learning_rate = 0.0\n",
" model.train()\n",
" is_overflow = False\n",
" # ================ MAIN TRAINNIG LOOP! ===================\n",
" for epoch in tqdm(range(epoch_offset, hparams.epochs)):\n",
" print(\"\\nStarting Epoch: {} Iteration: {}\".format(epoch, iteration))\n",
" start_eposh = time.perf_counter() # eposh is russian, not a typo\n",
" for i, batch in tqdm(enumerate(train_loader), total=len(train_loader)):\n",
" start = time.perf_counter()\n",
" if iteration < hparams.decay_start: learning_rate = hparams.A_\n",
" else: iteration_adjusted = iteration - hparams.decay_start; learning_rate = (hparams.A_*(e**(-iteration_adjusted/hparams.B_))) + hparams.C_\n",
" learning_rate = max(hparams.min_learning_rate, learning_rate) # output the largest number\n",
" for param_group in optimizer.param_groups:\n",
" param_group['lr'] = learning_rate\n",
"\n",
" model.zero_grad()\n",
" x, y = model.parse_batch(batch)\n",
" y_pred = model(x)\n",
"\n",
" loss = criterion(y_pred, y)\n",
" if hparams.distributed_run:\n",
" reduced_loss = reduce_tensor(loss.data, n_gpus).item()\n",
" else:\n",
" reduced_loss = loss.item()\n",
" if hparams.fp16_run:\n",
" with amp.scale_loss(loss, optimizer) as scaled_loss:\n",
" scaled_loss.backward()\n",
" else:\n",
" loss.backward()\n",
"\n",
" if hparams.fp16_run:\n",
" grad_norm = torch.nn.utils.clip_grad_norm_(\n",
" amp.master_params(optimizer), hparams.grad_clip_thresh)\n",
" is_overflow = math.isnan(grad_norm)\n",
" else:\n",
" grad_norm = torch.nn.utils.clip_grad_norm_(\n",
" model.parameters(), hparams.grad_clip_thresh)\n",
"\n",
" optimizer.step()\n",
"\n",
" if not is_overflow and rank == 0:\n",
" duration = time.perf_counter() - start\n",
" logger.log_training(\n",
" reduced_loss, grad_norm, learning_rate, duration, iteration)\n",
" #print(\"Batch {} loss {:.6f} Grad Norm {:.6f} Time {:.6f}\".format(iteration, reduced_loss, grad_norm, duration), end='\\r', flush=True)\n",
"\n",
" iteration += 1\n",
" validate(model, criterion, valset, iteration,\n",
" hparams.batch_size, n_gpus, collate_fn, logger,\n",
" hparams.distributed_run, rank, epoch, start_eposh, learning_rate)\n",
" save_checkpoint(model, optimizer, learning_rate, iteration, checkpoint_path)\n",
" if log_directory2 != None:\n",
" copy_tree(log_directory, log_directory2)\n",
"def check_dataset(hparams):\n",
" from utils import load_wav_to_torch, load_filepaths_and_text\n",
" import os\n",
" import numpy as np\n",
" def check_arr(filelist_arr):\n",
" for i, file in enumerate(filelist_arr):\n",
" if len(file) > 2:\n",
" print(\"|\".join(file), \"\\nhas multiple '|', this may not be an error.\")\n",
" if hparams.load_mel_from_disk and '.wav' in file[0]:\n",
" print(\"[WARNING]\", file[0], \" in filelist while expecting .npy .\")\n",
" else:\n",
" if not hparams.load_mel_from_disk and '.npy' in file[0]:\n",
" print(\"[WARNING]\", file[0], \" in filelist while expecting .wav .\")\n",
" if (not os.path.exists(file[0])):\n",
" print(\"|\".join(file), \"\\n[WARNING] does not exist.\")\n",
" if len(file[1]) < 3:\n",
" print(\"|\".join(file), \"\\n[info] has no/very little text.\")\n",
" if not ((file[1].strip())[-1] in r\"!?,.;:\"):\n",
" print(\"|\".join(file), \"\\n[info] has no ending punctuation.\")\n",
" mel_length = 1\n",
" if hparams.load_mel_from_disk and '.npy' in file[0]:\n",
" melspec = torch.from_numpy(np.load(file[0], allow_pickle=True))\n",
" mel_length = melspec.shape[1]\n",
" if mel_length == 0:\n",
" print(\"|\".join(file), \"\\n[WARNING] has 0 duration.\")\n",
" print(\"Checking Training Files\")\n",
" audiopaths_and_text = load_filepaths_and_text(hparams.training_files) # get split lines from training_files text file.\n",
" check_arr(audiopaths_and_text)\n",
" print(\"Checking Validation Files\")\n",
" audiopaths_and_text = load_filepaths_and_text(hparams.validation_files) # get split lines from validation_files text file.\n",
" check_arr(audiopaths_and_text)\n",
" print(\"Finished Checking\")\n",
"\n",
"warm_start=False #sorry about that\n",
"n_gpus=1\n",
"rank=0\n",
"group_name=None\n",
"\n",
"# ---- 这是定义的默认参数,可以不用管 ----\n",
"hparams = create_hparams()\n",
"model_filename = 'current_model'\n",
"hparams.training_files = \"filelists/clipper_train_filelist.txt\"\n",
"hparams.validation_files = \"filelists/clipper_val_filelist.txt\"\n",
"#hparams.use_mmi=True, # not used in this notebook\n",
"#hparams.use_gaf=True, # not used in this notebook\n",
"#hparams.max_gaf=0.5, # not used in this notebook\n",
"#hparams.drop_frame_rate = 0.2 # not used in this notebook\n",
"hparams.p_attention_dropout=0.1\n",
"hparams.p_decoder_dropout=0.1\n",
"hparams.decay_start = 15000\n",
"hparams.A_ = 5e-4\n",
"hparams.B_ = 8000\n",
"hparams.C_ = 0\n",
"hparams.min_learning_rate = 1e-5\n",
"generate_mels = True\n",
"hparams.show_alignments = True\n",
"alignment_graph_height = 600\n",
"alignment_graph_width = 1000\n",
"hparams.batch_size = 32\n",
"hparams.load_mel_from_disk = True\n",
"hparams.ignore_layers = []\n",
"hparams.epochs = 10000\n",
"\n",
"torch.backends.cudnn.enabled = hparams.cudnn_enabled\n",
"torch.backends.cudnn.benchmark = hparams.cudnn_benchmark\n",
"output_directory = '/content/drive/MyDrive/colab/outdir' # Location to save Checkpoints\n",
"log_directory = '/content/tacotron2/logs' # Location to save Log files locally\n",
"log_directory2 = '/content/drive/MyDrive/colab/logs' # Location to copy log files (done at the end of each epoch to cut down on I/O)e\n",
"checkpoint_path = output_directory+(r'/')+model_filename\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2022-08-12T04:23:09.697053Z",
"iopub.status.busy": "2022-08-12T04:23:09.696148Z",
"iopub.status.idle": "2022-08-12T04:23:09.701991Z",
"shell.execute_reply": "2022-08-12T04:23:09.700529Z",
"shell.execute_reply.started": "2022-08-12T04:23:09.697015Z"
},
"id": "GKRvQ1EWiVhn",
"trusted": true
},
"outputs": [],
"source": [
"#@title 给你的模型取名(写字母数字)\n",
"model_filename = \"test\" #@param {type:\"string\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {
"iopub.execute_input": "2022-08-12T04:23:14.426817Z",
"iopub.status.busy": "2022-08-12T04:23:14.426455Z",
"iopub.status.idle": "2022-08-12T04:23:14.433170Z",
"shell.execute_reply": "2022-08-12T04:23:14.432204Z",
"shell.execute_reply.started": "2022-08-12T04:23:14.426786Z"
},
"trusted": true,
"cellView": "form",
"id": "ew2HzQl2IZhH"
},
"outputs": [],
"source": [
"#@title 添加之前训练的模型到输出文件夹\n",
"\n",
"#@markdown 如果之前训练过**同名**模型: 在Google云端硬盘分享模型,设置为任何人可见,然后把share id放在这里即可(参考gdown用法)\n",
"\n",
"#@markdown 如果第一次训练 : 直接跳过\n",
"os.chdir(\"outdir\")\n",
"# ! gdown --id \n",
"os.chdir(\"..\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2022-08-12T04:23:21.495804Z",
"iopub.status.busy": "2022-08-12T04:23:21.494800Z",
"iopub.status.idle": "2022-08-12T04:23:21.504471Z",
"shell.execute_reply": "2022-08-12T04:23:21.503429Z",
"shell.execute_reply.started": "2022-08-12T04:23:21.495754Z"
},
"id": "JzevuoJnkIsi",
"trusted": true
},
"outputs": [],
"source": [
"#@title 设置参数\n",
"\n",
"#@markdown **这两个参数是最重要的。**\n",
"\n",
"#@markdown 这个参数控制模型训练得多快。**不要设置太大,否则显卡会炸。**如果数据集比较大,设置在30左右比较好。\n",
"\n",
"#@markdown 如果数据集里音频文件的数量和这个参数差不多,训练会失败。\n",
"\n",
"hparams.batch_size = 8 #@param {type:\"integer\"}\n",
"\n",
"#@markdown 这个参数控制训练的次数\n",
"hparams.epochs = 1000 #@param {type:\"integer\"}\n",
"\n",
"#The rest aren't that important\n",
"hparams.p_attention_dropout=0.1\n",
"hparams.p_decoder_dropout=0.1\n",
"hparams.decay_start = 15000 # wait till decay_start to start decaying learning rate\n",
"hparams.A_ = 5e-4 # Start/Max Learning Rate\n",
"hparams.B_ = 8000 # Decay Rate\n",
"hparams.C_ = 0 # Shift learning rate equation by this value\n",
"hparams.min_learning_rate = 1e-5 # Min Learning Rate\n",
"generate_mels = True # Don't change\n",
"hparams.show_alignments = True\n",
"alignment_graph_height = 600\n",
"alignment_graph_width = 1000\n",
"hparams.load_mel_from_disk = True\n",
"hparams.ignore_layers = [] # Layers to reset (None by default, other than foreign languages this param can be ignored)\n",
"\n",
"torch.backends.cudnn.enabled = hparams.cudnn_enabled\n",
"torch.backends.cudnn.benchmark = hparams.cudnn_benchmark\n",
"output_directory = '/content/drive/MyDrive/colab/outdir' # Location to save Checkpoints\n",
"log_directory = '/content/tacotron2/logs' # Location to save Log files locally\n",
"log_directory2 = '/content/drive/MyDrive/colab/logs' # Location to copy log files (done at the end of each epoch to cut down on I/O)\n",
"checkpoint_path = output_directory+(r'/')+model_filename"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2022-08-12T04:23:25.281471Z",
"iopub.status.busy": "2022-08-12T04:23:25.280625Z",
"iopub.status.idle": "2022-08-12T04:23:26.307536Z",
"shell.execute_reply": "2022-08-12T04:23:26.305920Z",
"shell.execute_reply.started": "2022-08-12T04:23:25.281432Z"
},
"id": "yehA2fOliyUI",
"trusted": true
},
"outputs": [],
"source": [
"#@title 数据集文件列表\n",
"#@markdown 如果要求不高,两个列表用同一个文件即可\n",
"\n",
"#@markdown 训练集文件列表\n",
"training_files_name = \"list.txt\" #@param {type:\"string\"}\n",
"#@markdown 验证集文件列表\n",
"validation_files_name = \"list.txt\" #@param {type:\"string\"}\n",
"#@markdown 预处理文本的cleaner\n",
"\n",
"hparams_prefix = \"/content/tacotron2/filelists/\"\n",
"text_cleaner='japanese_phrase_cleaners' #@param {type:\"string\"}\n",
"text_cleaners=[text_cleaner]\n",
"#@markdown ### 各种cleaner的效果示例\n",
"#@markdown ### 1. 'japanese_cleaners'\n",
"#@markdown #### 处理前\n",
"#@markdown 何かあったらいつでも話して下さい。学院のことじゃなく、私事に関することでも何でも\n",
"#@markdown #### 处理后\n",
"#@markdown nanikaacltaraitsudemohanashItekudasai.gakuiNnokotojanaku,shijinikaNsurukotodemonanidemo.\n",
"#@markdown ### 2. 'japanese_tokenization_cleaners'\n",
"#@markdown #### 处理前\n",
"#@markdown 何かあったらいつでも話して下さい。学院のことじゃなく、私事に関することでも何でも\n",
"#@markdown #### 处理后\n",
"#@markdown nani ka acl tara itsu demo hanashi te kudasai. gakuiN no koto ja naku, shiji nikaNsuru koto de mo naNdemo.\n",
"#@markdown ### 3. 'japanese_accent_cleaners'\n",
"#@markdown #### 处理前\n",
"#@markdown 何かあったらいつでも話して下さい。学院のことじゃなく、私事に関することでも何でも\n",
"#@markdown #### 处理后\n",
"#@markdown :na)nika a)cltara i)tsudemo ha(na)shIte ku(dasa)i.:ga(kuiNno ko(to)janaku,:shi)jini ka(Nsu)ru ko(to)demo na)nidemo.\n",
"#@markdown ### 4. 'japanese_phrase_cleaners'\n",
"#@markdown #### 处理前\n",
"#@markdown 何かあったらいつでも話して下さい。学院のことじゃなく、私事に関することでも何でも\n",
"#@markdown #### 处理后\n",
"#@markdown nanika acltara itsudemo hanashIte kudasai. gakuiNno kotojanaku, shijini kaNsuru kotodemo nanidemo.\n",
"\n",
"training_files = hparams_prefix + training_files_name\n",
"validation_files = hparams_prefix + validation_files_name\n",
"\n",
"hparams.training_files = training_files\n",
"hparams.validation_files = validation_files\n",
"hparams.text_cleaners = text_cleaners"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2022-08-12T04:23:32.743603Z",
"iopub.status.busy": "2022-08-12T04:23:32.743060Z",
"iopub.status.idle": "2022-08-12T04:24:06.307565Z",
"shell.execute_reply": "2022-08-12T04:24:06.306350Z",
"shell.execute_reply.started": "2022-08-12T04:23:32.743559Z"
},
"id": "b_xMcYMfkc9L",
"trusted": true
},
"outputs": [],
"source": [
"#@title 生成MEL谱\n",
"# ---- Replace .wav with .npy in filelists ----\n",
"!sed -i -- 's,.wav|,.npy|,g' filelists/*.txt\n",
"# ---- Replace .wav with .npy in filelists ----\n",
"if generate_mels:\n",
" create_mels()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2022-08-12T04:24:09.055998Z",
"iopub.status.busy": "2022-08-12T04:24:09.055604Z",
"iopub.status.idle": "2022-08-12T04:24:09.263224Z",
"shell.execute_reply": "2022-08-12T04:24:09.261774Z",
"shell.execute_reply.started": "2022-08-12T04:24:09.055966Z"
},
"id": "oJXxqs6kkgLw",
"trusted": true
},
"outputs": [],
"source": [
"#@title 检查数据集\n",
"#@markdown 没有error就算成功\n",
"check_dataset(hparams)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "62-cfyIubje_"
},
"source": [
"#训练"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {
"iopub.execute_input": "2022-08-12T04:25:58.143443Z",
"iopub.status.busy": "2022-08-12T04:25:58.143025Z",
"iopub.status.idle": "2022-08-12T04:26:55.002866Z",
"shell.execute_reply": "2022-08-12T04:26:55.001342Z",
"shell.execute_reply.started": "2022-08-12T04:25:58.143406Z"
},
"id": "qJTrZhShk8ZR",
"trusted": true
},
"outputs": [],
"source": [
"#@title 开始训练\n",
"#@markdown Validation loss 越小,拟合效果可能越好\n",
"print('FP16 Run:', hparams.fp16_run)\n",
"print('Dynamic Loss Scaling:', hparams.dynamic_loss_scaling)\n",
"print('Distributed Run:', hparams.distributed_run)\n",
"print('cuDNN Enabled:', hparams.cudnn_enabled)\n",
"print('cuDNN Benchmark:', hparams.cudnn_benchmark)\n",
"train(output_directory, log_directory, checkpoint_path,\n",
" warm_start, n_gpus, rank, group_name, hparams, log_directory2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jDGVcS77b25R"
},
"source": [
"#语音合成"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "V6pX7t0cVlj9"
},
"source": [
"##用HiFi-GAN转换##"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mwsEA9fP4qfZ",
"cellView": "form"
},
"outputs": [],
"source": [
"#@markdown 配置:\n",
"\n",
"#@markdown 重新运行即可应用配置的更改\n",
"\n",
"#国际 HiFi-GAN 模型(有点机器音): 1qpgI41wNXFcH-iKq1Y42JlBC9j0je8PW\n",
"#@markdown 你训练好的tacotron2模型的路径填在`Tacotron2_Model`这里\n",
"Tacotron2_Model = '/content/drive/MyDrive/YOURMODEL'#@param {type:\"string\"}\n",
"TACOTRON2_ID = Tacotron2_Model\n",
"HIFIGAN_ID = \"1qpgI41wNXFcH-iKq1Y42JlBC9j0je8PW\"\n",
"#@markdown 选择预处理文本的cleaner\n",
"text_cleaner = 'japanese_phrase_cleaners'#@param {type:\"string\"}\n",
"\n",
"# Check if Initilized\n",
"try:\n",
" initilized\n",
"except NameError:\n",
" print(\"Setting up, please wait.\\n\")\n",
" !pip install tqdm -q\n",
" from tqdm.notebook import tqdm\n",
" with tqdm(total=5, leave=False) as pbar:\n",
" %tensorflow_version 1.x\n",
" import os\n",
" from os.path import exists, join, basename, splitext\n",
" !pip install gdown\n",
" git_repo_url = 'https://github.com/CjangCjengh/tacotron2-japanese.git'\n",
" project_name = splitext(basename(git_repo_url))[0]\n",
" if not exists(project_name):\n",
" # clone and install\n",
" !git clone -q --recursive {git_repo_url}\n",
" !git clone -q --recursive https://github.com/SortAnon/hifi-gan\n",
" !pip install -q librosa unidecode\n",
" pbar.update(1) # downloaded TT2 and HiFi-GAN\n",
" import sys\n",
" sys.path.append('hifi-gan')\n",
" sys.path.append(project_name)\n",
" import time\n",
" import matplotlib\n",
" import matplotlib.pylab as plt\n",
" import gdown\n",
" d = 'https://drive.google.com/uc?id='\n",
"\n",
" %matplotlib inline\n",
" import IPython.display as ipd\n",
" import numpy as np\n",
" import torch\n",
" import json\n",
" from hparams import create_hparams\n",
" from model import Tacotron2\n",
" from layers import TacotronSTFT\n",
" from audio_processing import griffin_lim\n",
" from text import text_to_sequence\n",
" from env import AttrDict\n",
" from meldataset import MAX_WAV_VALUE\n",
" from models import Generator\n",
"\n",
" pbar.update(1) # initialized Dependancies\n",
"\n",
" graph_width = 900\n",
" graph_height = 360\n",
" def plot_data(data, figsize=(int(graph_width/100), int(graph_height/100))):\n",
" %matplotlib inline\n",
" fig, axes = plt.subplots(1, len(data), figsize=figsize)\n",
" for i in range(len(data)):\n",
" axes[i].imshow(data[i], aspect='auto', origin='bottom', \n",
" interpolation='none', cmap='inferno')\n",
" fig.canvas.draw()\n",
" plt.show()\n",
"\n",
" # Setup Pronounciation Dictionary\n",
" !gdown --id '1E12g_sREdcH5vuZb44EZYX8JjGWQ9rRp'\n",
" thisdict = {}\n",
" for line in reversed((open('merged.dict.txt', \"r\").read()).splitlines()):\n",
" thisdict[(line.split(\" \",1))[0]] = (line.split(\" \",1))[1].strip()\n",
"\n",
" pbar.update(1) # Downloaded and Set up Pronounciation Dictionary\n",
"\n",
" def ARPA(text, punctuation=r\"!?,.;\", EOS_Token=True):\n",
" out = ''\n",
" for word_ in text.split(\" \"):\n",
" word=word_; end_chars = ''\n",
" while any(elem in word for elem in punctuation) and len(word) > 1:\n",
" if word[-1] in punctuation: end_chars = word[-1] + end_chars; word = word[:-1]\n",
" else: break\n",
" try:\n",
" word_arpa = thisdict[word.upper()]\n",
" word = \"{\" + str(word_arpa) + \"}\"\n",
" except KeyError: pass\n",
" out = (out + \" \" + word + end_chars).strip()\n",
" if EOS_Token and out[-1] != \";\": out += \";\"\n",
" return out\n",
"\n",
" def get_hifigan(MODEL_ID):\n",
" # Download HiFi-GAN\n",
" hifigan_pretrained_model = 'hifimodel'\n",
" gdown.download(d+MODEL_ID, hifigan_pretrained_model, quiet=False)\n",
" if not exists(hifigan_pretrained_model):\n",
" raise Exception(\"HiFI-GAN model failed to download!\")\n",
"\n",
" # Load HiFi-GAN\n",
" conf = os.path.join(\"hifi-gan\", \"config_v1.json\")\n",
" with open(conf) as f:\n",
" json_config = json.loads(f.read())\n",
" h = AttrDict(json_config)\n",
" torch.manual_seed(h.seed)\n",
" hifigan = Generator(h).to(torch.device(\"cuda\"))\n",
" state_dict_g = torch.load(hifigan_pretrained_model, map_location=torch.device(\"cuda\"))\n",
" hifigan.load_state_dict(state_dict_g[\"generator\"])\n",
" hifigan.eval()\n",
" hifigan.remove_weight_norm()\n",
" return hifigan, h\n",
"\n",
" hifigan, h = get_hifigan(HIFIGAN_ID)\n",
" pbar.update(1) # Downloaded and Set up HiFi-GAN\n",
"\n",
" def has_MMI(STATE_DICT):\n",
" return any(True for x in STATE_DICT.keys() if \"mi.\" in x)\n",
"\n",
" def get_Tactron2(MODEL_ID):\n",
" # Download Tacotron2\n",
" tacotron2_pretrained_model = TACOTRON2_ID\n",
" if not exists(tacotron2_pretrained_model):\n",
" raise Exception(\"Tacotron2 model failed to download!\")\n",
" # Load Tacotron2 and Config\n",
" hparams = create_hparams()\n",
" hparams.sampling_rate = 22050\n",
" hparams.max_decoder_steps = 3000 # Max Duration\n",
" hparams.gate_threshold = 0.25 # Model must be 25% sure the clip is over before ending generation\n",
" model = Tacotron2(hparams)\n",
" state_dict = torch.load(tacotron2_pretrained_model)['state_dict']\n",
" if has_MMI(state_dict):\n",
" raise Exception(\"ERROR: This notebook does not currently support MMI models.\")\n",
" model.load_state_dict(state_dict)\n",
" _ = model.cuda().eval().half()\n",
" return model, hparams\n",
"\n",
" model, hparams = get_Tactron2(TACOTRON2_ID)\n",
" previous_tt2_id = TACOTRON2_ID\n",
"\n",
" pbar.update(1) # Downloaded and Set up Tacotron2\n",
"\n",
" # Extra Info\n",
" def end_to_end_infer(text, pronounciation_dictionary, show_graphs):\n",
" for i in [x for x in text.split(\"\\n\") if len(x)]:\n",
" if not pronounciation_dictionary:\n",
" if i[-1] != \";\": i=i+\";\" \n",
" else: i = ARPA(i)\n",
" with torch.no_grad(): # save VRAM by not including gradients\n",
" sequence = np.array(text_to_sequence(i, [text_cleaner]))[None, :]\n",
" sequence = torch.autograd.Variable(torch.from_numpy(sequence)).cuda().long()\n",
" mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)\n",
" if show_graphs:\n",
" plot_data((mel_outputs_postnet.float().data.cpu().numpy()[0],\n",
" alignments.float().data.cpu().numpy()[0].T))\n",
" y_g_hat = hifigan(mel_outputs_postnet.float())\n",
" audio = y_g_hat.squeeze()\n",
" audio = audio * MAX_WAV_VALUE\n",
" print(\"\")\n",
" ipd.display(ipd.Audio(audio.cpu().numpy().astype(\"int16\"), rate=hparams.sampling_rate))\n",
" from IPython.display import clear_output\n",
" clear_output()\n",
" initilized = \"Ready\"\n",
"\n",
"if previous_tt2_id != TACOTRON2_ID:\n",
" print(\"Updating Models\")\n",
" model, hparams = get_Tactron2(TACOTRON2_ID)\n",
" hifigan, h = get_hifigan(HIFIGAN_ID)\n",
" previous_tt2_id = TACOTRON2_ID\n",
"\n",
"pronounciation_dictionary = False #@param {type:\"boolean\"}\n",
"# disables automatic ARPAbet conversion, useful for inputting your own ARPAbet pronounciations or just for testing\n",
"show_graphs = True #@param {type:\"boolean\"}\n",
"max_duration = 25 #this does nothing\n",
"model.decoder.max_decoder_steps = 1000 #@param {type:\"integer\"}\n",
"stop_threshold = 0.324 #@param {type:\"number\"}\n",
"model.decoder.gate_threshold = stop_threshold\n",
"\n",
"#@markdown ---\n",
"\n",
"print(f\"Current Config:\\npronounciation_dictionary: {pronounciation_dictionary}\\nshow_graphs: {show_graphs}\\nmax_duration (in seconds): {max_duration}\\nstop_threshold: {stop_threshold}\\n\\n\")\n",
"\n",
"time.sleep(1)\n",
"print(\"输入要转换成语音的文本.\")\n",
"contents = []\n",
"while True:\n",
" try:\n",
" print(\"-\"*50)\n",
" line = input()\n",
" if line == \"\":\n",
" continue\n",
" end_to_end_infer(line, pronounciation_dictionary, show_graphs)\n",
" except EOFError:\n",
" break\n",
" except KeyboardInterrupt:\n",
" print(\"程序终止...\")\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Uitul995V0Jw"
},
"source": [
"##用 Waveglow##\n",
"(个人不建议使用)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "EXR9p7AeSuL6"
},
"outputs": [],
"source": [
"#@title 安装 Tacotron 和 Waveglow\n",
"!pip install -U tensorflow==1.15.2\n",
"import os\n",
"from os.path import exists, join, basename, splitext\n",
"!pip install gdown\n",
"git_repo_url = 'https://github.com/CjangCjengh/tacotron2-japanese.git'\n",
"project_name = splitext(basename(git_repo_url))[0]\n",
"if not exists(project_name):\n",
" # clone and install\n",
" !git clone -q --recursive {git_repo_url}\n",
" !cd {project_name}/waveglow && git checkout 2fd4e63\n",
" !pip install -q librosa unidecode\n",
" \n",
"import sys\n",
"sys.path.append(join(project_name, 'waveglow/'))\n",
"sys.path.append(project_name)\n",
"import time\n",
"import matplotlib\n",
"import matplotlib.pylab as plt\n",
"import gdown\n",
"d = 'https://drive.google.com/uc?id='"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "liDdgta-SyPP"
},
"outputs": [],
"source": [
"#@title 加载预训练模型\n",
"force_download_TT2 = True\n",
"tacotron2_pretrained_model = '/PATH/Your Tactron2 Model'#@param {type:\"string\"}\n",
"waveglow_pretrained_model = '/PATH/waveglow_256channels_ljs_v3.pt'#@param {type:\"string\"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Q0BWBVdCS9ty",
"cellView": "form"
},
"outputs": [],
"source": [
"#@title 安装 Tacotron 和 Waveglow \n",
"%matplotlib inline\n",
"import IPython.display as ipd\n",
"import numpy as np\n",
"import torch\n",
"\n",
"from hparams import create_hparams\n",
"from model import Tacotron2\n",
"from layers import TacotronSTFT\n",
"from audio_processing import griffin_lim\n",
"from text import text_to_sequence\n",
"from denoiser import Denoiser\n",
"\n",
"graph_width = 900\n",
"graph_height = 360\n",
"def plot_data(data, figsize=(int(graph_width/100), int(graph_height/100))):\n",
" %matplotlib inline\n",
" fig, axes = plt.subplots(1, len(data), figsize=figsize)\n",
" for i in range(len(data)):\n",
" axes[i].imshow(data[i], aspect='auto', origin='bottom', \n",
" interpolation='none', cmap='inferno')\n",
" fig.canvas.draw()\n",
" plt.show()\n",
"\n",
"!gdown --id '1E12g_sREdcH5vuZb44EZYX8JjGWQ9rRp'\n",
"thisdict = {}\n",
"for line in reversed((open('merged.dict.txt', \"r\").read()).splitlines()):\n",
" thisdict[(line.split(\" \",1))[0]] = (line.split(\" \",1))[1].strip()\n",
"def ARPA(text):\n",
" out = ''\n",
" for word_ in text.split(\" \"):\n",
" word=word_; end_chars = ''\n",
" while any(elem in word for elem in r\"!?,.;\") and len(word) > 1:\n",
" if word[-1] == '!': end_chars = '!' + end_chars; word = word[:-1]\n",
" if word[-1] == '?': end_chars = '?' + end_chars; word = word[:-1]\n",
" if word[-1] == ',': end_chars = ',' + end_chars; word = word[:-1]\n",
" if word[-1] == '.': end_chars = '.' + end_chars; word = word[:-1]\n",
" if word[-1] == ';': end_chars = ';' + end_chars; word = word[:-1]\n",
" else: break\n",
" try: word_arpa = thisdict[word.upper()]\n",
" except: word_arpa = ''\n",
" if len(word_arpa)!=0: word = \"{\" + str(word_arpa) + \"}\"\n",
" out = (out + \" \" + word + end_chars).strip()\n",
" if out[-1] != \";\": out = out + \";\"\n",
" return out\n",
"\n",
"#torch.set_grad_enabled(False)\n",
"\n",
"# initialize Tacotron2 with the pretrained model\n",
"hparams = create_hparams()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "Sqe-jWarTCw6"
},
"outputs": [],
"source": [
"#@title 参数\n",
"# Load Tacotron2 (run this cell every time you change the model)\n",
"hparams.sampling_rate = 22050 # Don't change this\n",
"hparams.max_decoder_steps = 1000 # How long the audio will be before it cuts off (1000 is about 11 seconds)\n",
"hparams.gate_threshold = 0.1 # Model must be 90% sure the clip is over before ending generation (the higher this number is, the more likely that the AI will keep generating until it reaches the Max Decoder Steps)\n",
"model = Tacotron2(hparams)\n",
"model.load_state_dict(torch.load(tacotron2_pretrained_model)['state_dict'])\n",
"_ = model.cuda().eval().half()\n",
"\n",
"# Load WaveGlow\n",
"waveglow = torch.load(waveglow_pretrained_model)['model']\n",
"waveglow.cuda().eval().half()\n",
"for k in waveglow.convinv:\n",
" k.float()\n",
"denoiser = Denoiser(waveglow)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "FiRVI-J3TNnc"
},
"outputs": [],
"source": [
"#@title 开始合成!\n",
"text = 'Your Text Here'#@param {type:\"string\"}\n",
"sigma = 0.8\n",
"denoise_strength = 0.324\n",
"raw_input = True # disables automatic ARPAbet conversion, useful for inputting your own ARPAbet pronounciations or just for testing.\n",
" # should be True if synthesizing a non-English language\n",
"\n",
"for i in text.split(\"\\n\"):\n",
" if len(i) < 1: continue;\n",
" print(i)\n",
" if raw_input:\n",
" if i[-1] != \";\": i=i+\";\" \n",
" else: i = ARPA(i)\n",
" print(i)\n",
" with torch.no_grad(): # save VRAM by not including gradients\n",
" sequence = np.array(text_to_sequence(i, ['english_cleaners']))[None, :]\n",
" sequence = torch.autograd.Variable(torch.from_numpy(sequence)).cuda().long()\n",
" mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)\n",
" plot_data((mel_outputs_postnet.float().data.cpu().numpy()[0],\n",
" alignments.float().data.cpu().numpy()[0].T))\n",
" audio = waveglow.infer(mel_outputs_postnet, sigma=sigma); print(\"\"); ipd.display(ipd.Audio(audio[0].data.cpu().numpy(), rate=hparams.sampling_rate))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.12"
},
"colab": {
"name": "tacotron2训练和语音合成.ipynb",
"provenance": [],
"collapsed_sections": []
},
"accelerator": "GPU",
"gpuClass": "standard"
},
"nbformat": 4,
"nbformat_minor": 0
}