{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "oxl2bna8ztl2" }, "source": [ "**2022/08/12 更新和汉化,排除tensorflow版本问题,使用 https://github.com/CjangCjengh/tacotron2-japanese 库中的cleaner省去罗马音环节**\n", "**2022/08/12 Updates and use Chinese instead. Tensorflow version problem solved.**\n", "**Use cleaners in repo https://github.com/CjangCjengh/tacotron2-japanese to auto-generate roman words.**\n", "
\n", "\n", "**Updated 2022/03/14 and the unpickling error is solved. The training part works as of 2022/03/14**\n", "\n", "**2022/03/15 Speech synsthesis with HiFi-GAN works** \n", "\n", "**2022/03/16 Speech synsthesis with Waveglow should work again now (tested)** \n", "\n", "\n" ] }, { "cell_type": "markdown", "source": [ "**Tacotron 2 Training and Synthesis Notebook**\n", "originally based on the following notebooks\n", "https://github.com/NVIDIA/tacotron2,\n", "https://bit.ly/3F4DkH2\n", "and those presented in Adam is cool and stuff (https://youtu.be/LQAOCXdU8p8 and https://youtu.be/XLt_K_692Mc)\n", "\n", "
\n", "\n", "**Tacotron 2 训练和语音合成笔记本**\n", "基于\n", "https://github.com/NVIDIA/tacotron2,\n", "https://bit.ly/3F4DkH2,\n", "https://colab.research.google.com/drive/1VAuIqEAnrmCig3Edt5zFgQdckY9TDi3N\n", "\n", "还要感谢Adam is cool and stuff频道的视频 (https://youtu.be/LQAOCXdU8p8 and https://youtu.be/XLt_K_692Mc)\n", "\n", "感谢CjangCjengh的cleaner支持和视频灵感 (https://www.bilibili.com/video/BV1rV4y177Z7)" ], "metadata": { "id": "2CuJ5rIAIv1H" } }, { "cell_type": "markdown", "metadata": { "id": "M5rkhiBCXbMY" }, "source": [ "**使用LJ Speech数据集预先训练权重的WaveGlow模型可以从这个地址下载:https://catalog.ngc.nvidia.com/orgs/nvidia/models/waveglow_ljs_256channels**\n" ] }, { "cell_type": "markdown", "metadata": { "id": "r2D2Mt80bamF" }, "source": [ "# 准备数据 " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-08-12T04:13:05.594388Z", "iopub.status.busy": "2022-08-12T04:13:05.593773Z", "iopub.status.idle": "2022-08-12T04:13:11.760149Z", "shell.execute_reply": "2022-08-12T04:13:11.758823Z", "shell.execute_reply.started": "2022-08-12T04:13:05.594273Z" }, "id": "67u1nnaJcyPt", "trusted": true }, "outputs": [], "source": [ "#@title 下载 Tacotron 2\n", "!git clone https://github.com/CjangCjengh/tacotron2-japanese tacotron2\n", "!git submodule init\n", "!git submodule update" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-08-12T04:13:20.371238Z", "iopub.status.busy": "2022-08-12T04:13:20.370604Z", "iopub.status.idle": "2022-08-12T04:18:18.664088Z", "shell.execute_reply": "2022-08-12T04:18:18.662898Z", "shell.execute_reply.started": "2022-08-12T04:13:20.371200Z" }, "trusted": true, "cellView": "form", "id": "ujgAmihAIZg1" }, "outputs": [], "source": [ "#@title 安装依赖\n", "!pip install -U tensorflow==1.15.2\n", "!pip install -q unidecode tensorboardX\n", "!pip install librosa==0.8.0\n", "!pip install pysoundfile==0.9.0.post1\n", "!pip install unidecode==1.3.4\n", "!pip install pyopenjtalk==0.2.0\n", "!pip install inflect==5.6.2\n", "!pip install janome==0.4.2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "trusted": true, "cellView": "form", "id": "TAouEAY7IZg7" }, "outputs": [], "source": [ "#@title 加载Google云端硬盘\n", "from google.colab import drive\n", "drive.mount('drive')" ] }, { "cell_type": "code", "source": [ "#@title 创建文件夹和下载预训练模型\n", "import os\n", "if os.getcwd() != '/content/tacotron2':\n", " os.chdir('/content/tacotron2')\n", "! gdown --id 1c5ZTuT7J08wLUoVZ2KkUs_VdZuJ86ZqA\n", "if not os.path.isdir(\"wavs\"):\n", " os.mkdir('wavs')\n", "if not os.path.isdir(\"outdir\"):\n", " os.mkdir(\"outdir\")" ], "metadata": { "cellView": "form", "id": "-tH9g-WrJ0Ro" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "pMt_D2p7c7Dl" }, "source": [ "### 上传数据\n", "\n", "`text file` 是音频文件列表\n", "\n", "`audio files` 是音频文件" ] }, { "cell_type": "markdown", "source": [ "![Tacotron2InstructionImages.jpg]()" ], "metadata": { "id": "6tKrpYzpg8t0" } }, { "cell_type": "markdown", "metadata": { "id": "IOETvfJdbfYO" }, "source": [ "# 准备模型" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-08-12T04:22:26.512807Z", "iopub.status.busy": "2022-08-12T04:22:26.512405Z", "iopub.status.idle": "2022-08-12T04:22:48.851514Z", "shell.execute_reply": "2022-08-12T04:22:48.850448Z", "shell.execute_reply.started": "2022-08-12T04:22:26.512767Z" }, "trusted": true, "cellView": "form", "id": "VKFIe_5pIZg9" }, "outputs": [], "source": [ "#@title 训练模型的代码\n", "%matplotlib inline\n", "\n", "import time\n", "import argparse\n", "import math\n", "from numpy import finfo\n", "\n", "import torch\n", "from distributed import apply_gradient_allreduce\n", "import torch.distributed as dist\n", "from torch.utils.data.distributed import DistributedSampler\n", "from torch.utils.data import DataLoader\n", "\n", "from model import Tacotron2\n", "from data_utils import TextMelLoader, TextMelCollate\n", "from loss_function import Tacotron2Loss\n", "from logger import Tacotron2Logger\n", "from hparams import create_hparams\n", " \n", "import random\n", "import numpy as np\n", "\n", "import layers\n", "from utils import load_wav_to_torch, load_filepaths_and_text\n", "from text import text_to_sequence\n", "from math import e\n", "#from tqdm import tqdm # Terminal\n", "#from tqdm import tqdm_notebook as tqdm # Legacy Notebook TQDM\n", "from tqdm.notebook import tqdm # Modern Notebook TQDM\n", "from distutils.dir_util import copy_tree\n", "import matplotlib.pylab as plt\n", "\n", "def download_from_google_drive(file_id, file_name):\n", " # download a file from the Google Drive link\n", " !rm -f ./cookie\n", " !curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id={file_id}\" > /dev/null\n", " confirm_text = !awk '/download/ {print $NF}' ./cookie\n", " confirm_text = confirm_text[0]\n", " !curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm={confirm_text}&id={file_id}\" -o {file_name}\n", "\n", "def create_mels():\n", " print(\"Generating Mels\")\n", " stft = layers.TacotronSTFT(\n", " hparams.filter_length, hparams.hop_length, hparams.win_length,\n", " hparams.n_mel_channels, hparams.sampling_rate, hparams.mel_fmin,\n", " hparams.mel_fmax)\n", " def save_mel(filename):\n", " audio, sampling_rate = load_wav_to_torch(filename)\n", " if sampling_rate != stft.sampling_rate:\n", " raise ValueError(\"{} {} SR doesn't match target {} SR\".format(filename, \n", " sampling_rate, stft.sampling_rate))\n", " audio_norm = audio / hparams.max_wav_value\n", " audio_norm = audio_norm.unsqueeze(0)\n", " audio_norm = torch.autograd.Variable(audio_norm, requires_grad=False)\n", " melspec = stft.mel_spectrogram(audio_norm)\n", " melspec = torch.squeeze(melspec, 0).cpu().numpy()\n", " np.save(filename.replace('.wav', ''), melspec)\n", "\n", " import glob\n", " wavs = glob.glob('wavs/*.wav')\n", " for i in tqdm(wavs):\n", " save_mel(i)\n", "\n", "\n", "def reduce_tensor(tensor, n_gpus):\n", " rt = tensor.clone()\n", " dist.all_reduce(rt, op=dist.reduce_op.SUM)\n", " rt /= n_gpus\n", " return rt\n", "\n", "\n", "def init_distributed(hparams, n_gpus, rank, group_name):\n", " assert torch.cuda.is_available(), \"Distributed mode requires CUDA.\"\n", " print(\"Initializing Distributed\")\n", "\n", " # Set cuda device so everything is done on the right GPU.\n", " torch.cuda.set_device(rank % torch.cuda.device_count())\n", "\n", " # Initialize distributed communication\n", " dist.init_process_group(\n", " backend=hparams.dist_backend, init_method=hparams.dist_url,\n", " world_size=n_gpus, rank=rank, group_name=group_name)\n", "\n", " print(\"Done initializing distributed\")\n", "\n", "\n", "def prepare_dataloaders(hparams):\n", " # Get data, data loaders and collate function ready\n", " trainset = TextMelLoader(hparams.training_files, hparams)\n", " valset = TextMelLoader(hparams.validation_files, hparams)\n", " collate_fn = TextMelCollate(hparams.n_frames_per_step)\n", "\n", " if hparams.distributed_run:\n", " train_sampler = DistributedSampler(trainset)\n", " shuffle = False\n", " else:\n", " train_sampler = None\n", " shuffle = True\n", "\n", " train_loader = DataLoader(trainset, num_workers=1, shuffle=shuffle,\n", " sampler=train_sampler,\n", " batch_size=hparams.batch_size, pin_memory=False,\n", " drop_last=True, collate_fn=collate_fn)\n", " return train_loader, valset, collate_fn\n", "\n", "\n", "def prepare_directories_and_logger(output_directory, log_directory, rank):\n", " if rank == 0:\n", " if not os.path.isdir(output_directory):\n", " os.makedirs(output_directory)\n", " os.chmod(output_directory, 0o775)\n", " logger = Tacotron2Logger(os.path.join(output_directory, log_directory))\n", " else:\n", " logger = None\n", " return logger\n", "\n", "\n", "def load_model(hparams):\n", " model = Tacotron2(hparams).cuda()\n", " if hparams.fp16_run:\n", " model.decoder.attention_layer.score_mask_value = finfo('float16').min\n", "\n", " if hparams.distributed_run:\n", " model = apply_gradient_allreduce(model)\n", "\n", " return model\n", "\n", "\n", "def warm_start_model(checkpoint_path, model, ignore_layers):\n", " assert os.path.isfile(checkpoint_path)\n", " print(\"Warm starting model from checkpoint '{}'\".format(checkpoint_path))\n", " checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')\n", " model_dict = checkpoint_dict['state_dict']\n", " if len(ignore_layers) > 0:\n", " model_dict = {k: v for k, v in model_dict.items()\n", " if k not in ignore_layers}\n", " dummy_dict = model.state_dict()\n", " dummy_dict.update(model_dict)\n", " model_dict = dummy_dict\n", " model.load_state_dict(model_dict)\n", " return model\n", "\n", "\n", "def load_checkpoint(checkpoint_path, model, optimizer):\n", " assert os.path.isfile(checkpoint_path)\n", " print(\"Loading checkpoint '{}'\".format(checkpoint_path))\n", " checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')\n", " model.load_state_dict(checkpoint_dict['state_dict'])\n", " optimizer.load_state_dict(checkpoint_dict['optimizer'])\n", " learning_rate = checkpoint_dict['learning_rate']\n", " iteration = checkpoint_dict['iteration']\n", " print(\"Loaded checkpoint '{}' from iteration {}\" .format(\n", " checkpoint_path, iteration))\n", " return model, optimizer, learning_rate, iteration\n", "\n", "\n", "def save_checkpoint(model, optimizer, learning_rate, iteration, filepath):\n", " print(\"Saving model and optimizer state at iteration {} to {}\".format(\n", " iteration, filepath))\n", " try:\n", " torch.save({'iteration': iteration,\n", " 'state_dict': model.state_dict(),\n", " 'optimizer': optimizer.state_dict(),\n", " 'learning_rate': learning_rate}, filepath)\n", " except KeyboardInterrupt:\n", " print(\"interrupt received while saving, waiting for save to complete.\")\n", " torch.save({'iteration': iteration,'state_dict': model.state_dict(),'optimizer': optimizer.state_dict(),'learning_rate': learning_rate}, filepath)\n", " print(\"Model Saved\")\n", "\n", "def plot_alignment(alignment, info=None):\n", " %matplotlib inline\n", " fig, ax = plt.subplots(figsize=(int(alignment_graph_width/100), int(alignment_graph_height/100)))\n", " im = ax.imshow(alignment, cmap='inferno', aspect='auto', origin='lower',\n", " interpolation='none')\n", " ax.autoscale(enable=True, axis=\"y\", tight=True)\n", " fig.colorbar(im, ax=ax)\n", " xlabel = 'Decoder timestep'\n", " if info is not None:\n", " xlabel += '\\n\\n' + info\n", " plt.xlabel(xlabel)\n", " plt.ylabel('Encoder timestep')\n", " plt.tight_layout()\n", " fig.canvas.draw()\n", " plt.show()\n", "\n", "def validate(model, criterion, valset, iteration, batch_size, n_gpus,\n", " collate_fn, logger, distributed_run, rank, epoch, start_eposh, learning_rate):\n", " \"\"\"Handles all the validation scoring and printing\"\"\"\n", " model.eval()\n", " with torch.no_grad():\n", " val_sampler = DistributedSampler(valset) if distributed_run else None\n", " val_loader = DataLoader(valset, sampler=val_sampler, num_workers=1,\n", " shuffle=False, batch_size=batch_size,\n", " pin_memory=False, collate_fn=collate_fn)\n", "\n", " val_loss = 0.0\n", " for i, batch in enumerate(val_loader):\n", " x, y = model.parse_batch(batch)\n", " y_pred = model(x)\n", " loss = criterion(y_pred, y)\n", " if distributed_run:\n", " reduced_val_loss = reduce_tensor(loss.data, n_gpus).item()\n", " else:\n", " reduced_val_loss = loss.item()\n", " val_loss += reduced_val_loss\n", " val_loss = val_loss / (i + 1)\n", "\n", " model.train()\n", " if rank == 0:\n", " print(\"Epoch: {} Validation loss {}: {:9f} Time: {:.1f}m LR: {:.6f}\".format(epoch, iteration, val_loss,(time.perf_counter()-start_eposh)/60, learning_rate))\n", " logger.log_validation(val_loss, model, y, y_pred, iteration)\n", " if hparams.show_alignments:\n", " %matplotlib inline\n", " _, mel_outputs, gate_outputs, alignments = y_pred\n", " idx = random.randint(0, alignments.size(0) - 1)\n", " plot_alignment(alignments[idx].data.cpu().numpy().T)\n", "\n", "def train(output_directory, log_directory, checkpoint_path, warm_start, n_gpus,\n", " rank, group_name, hparams, log_directory2):\n", " \"\"\"Training and validation logging results to tensorboard and stdout\n", "\n", " Params\n", " ------\n", " output_directory (string): directory to save checkpoints\n", " log_directory (string) directory to save tensorboard logs\n", " checkpoint_path(string): checkpoint path\n", " n_gpus (int): number of gpus\n", " rank (int): rank of current gpu\n", " hparams (object): comma separated list of \"name=value\" pairs.\n", " \"\"\"\n", " if hparams.distributed_run:\n", " init_distributed(hparams, n_gpus, rank, group_name)\n", "\n", " torch.manual_seed(hparams.seed)\n", " torch.cuda.manual_seed(hparams.seed)\n", "\n", " model = load_model(hparams)\n", " learning_rate = hparams.learning_rate\n", " optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate,\n", " weight_decay=hparams.weight_decay)\n", "\n", " if hparams.fp16_run:\n", " from apex import amp\n", " model, optimizer = amp.initialize(\n", " model, optimizer, opt_level='O2')\n", "\n", " if hparams.distributed_run:\n", " model = apply_gradient_allreduce(model)\n", "\n", " criterion = Tacotron2Loss()\n", "\n", " logger = prepare_directories_and_logger(\n", " output_directory, log_directory, rank)\n", "\n", " train_loader, valset, collate_fn = prepare_dataloaders(hparams)\n", "\n", " # Load checkpoint if one exists\n", " iteration = 0\n", " epoch_offset = 0\n", " if checkpoint_path is not None and os.path.isfile(checkpoint_path):\n", " if warm_start:\n", " model = warm_start_model(\n", " checkpoint_path, model, hparams.ignore_layers)\n", " else:\n", " model, optimizer, _learning_rate, iteration = load_checkpoint(\n", " checkpoint_path, model, optimizer)\n", " if hparams.use_saved_learning_rate:\n", " learning_rate = _learning_rate\n", " iteration += 1 # next iteration is iteration + 1\n", " epoch_offset = max(0, int(iteration / len(train_loader)))\n", " else:\n", " os.path.isfile(\"tacotron2_statedict.pt\")\n", " model = warm_start_model(\"tacotron2_statedict.pt\", model, hparams.ignore_layers)\n", " # download LJSpeech pretrained model if no checkpoint already exists\n", " \n", " start_eposh = time.perf_counter()\n", " learning_rate = 0.0\n", " model.train()\n", " is_overflow = False\n", " # ================ MAIN TRAINNIG LOOP! ===================\n", " for epoch in tqdm(range(epoch_offset, hparams.epochs)):\n", " print(\"\\nStarting Epoch: {} Iteration: {}\".format(epoch, iteration))\n", " start_eposh = time.perf_counter() # eposh is russian, not a typo\n", " for i, batch in tqdm(enumerate(train_loader), total=len(train_loader)):\n", " start = time.perf_counter()\n", " if iteration < hparams.decay_start: learning_rate = hparams.A_\n", " else: iteration_adjusted = iteration - hparams.decay_start; learning_rate = (hparams.A_*(e**(-iteration_adjusted/hparams.B_))) + hparams.C_\n", " learning_rate = max(hparams.min_learning_rate, learning_rate) # output the largest number\n", " for param_group in optimizer.param_groups:\n", " param_group['lr'] = learning_rate\n", "\n", " model.zero_grad()\n", " x, y = model.parse_batch(batch)\n", " y_pred = model(x)\n", "\n", " loss = criterion(y_pred, y)\n", " if hparams.distributed_run:\n", " reduced_loss = reduce_tensor(loss.data, n_gpus).item()\n", " else:\n", " reduced_loss = loss.item()\n", " if hparams.fp16_run:\n", " with amp.scale_loss(loss, optimizer) as scaled_loss:\n", " scaled_loss.backward()\n", " else:\n", " loss.backward()\n", "\n", " if hparams.fp16_run:\n", " grad_norm = torch.nn.utils.clip_grad_norm_(\n", " amp.master_params(optimizer), hparams.grad_clip_thresh)\n", " is_overflow = math.isnan(grad_norm)\n", " else:\n", " grad_norm = torch.nn.utils.clip_grad_norm_(\n", " model.parameters(), hparams.grad_clip_thresh)\n", "\n", " optimizer.step()\n", "\n", " if not is_overflow and rank == 0:\n", " duration = time.perf_counter() - start\n", " logger.log_training(\n", " reduced_loss, grad_norm, learning_rate, duration, iteration)\n", " #print(\"Batch {} loss {:.6f} Grad Norm {:.6f} Time {:.6f}\".format(iteration, reduced_loss, grad_norm, duration), end='\\r', flush=True)\n", "\n", " iteration += 1\n", " validate(model, criterion, valset, iteration,\n", " hparams.batch_size, n_gpus, collate_fn, logger,\n", " hparams.distributed_run, rank, epoch, start_eposh, learning_rate)\n", " save_checkpoint(model, optimizer, learning_rate, iteration, checkpoint_path)\n", " if log_directory2 != None:\n", " copy_tree(log_directory, log_directory2)\n", "def check_dataset(hparams):\n", " from utils import load_wav_to_torch, load_filepaths_and_text\n", " import os\n", " import numpy as np\n", " def check_arr(filelist_arr):\n", " for i, file in enumerate(filelist_arr):\n", " if len(file) > 2:\n", " print(\"|\".join(file), \"\\nhas multiple '|', this may not be an error.\")\n", " if hparams.load_mel_from_disk and '.wav' in file[0]:\n", " print(\"[WARNING]\", file[0], \" in filelist while expecting .npy .\")\n", " else:\n", " if not hparams.load_mel_from_disk and '.npy' in file[0]:\n", " print(\"[WARNING]\", file[0], \" in filelist while expecting .wav .\")\n", " if (not os.path.exists(file[0])):\n", " print(\"|\".join(file), \"\\n[WARNING] does not exist.\")\n", " if len(file[1]) < 3:\n", " print(\"|\".join(file), \"\\n[info] has no/very little text.\")\n", " if not ((file[1].strip())[-1] in r\"!?,.;:\"):\n", " print(\"|\".join(file), \"\\n[info] has no ending punctuation.\")\n", " mel_length = 1\n", " if hparams.load_mel_from_disk and '.npy' in file[0]:\n", " melspec = torch.from_numpy(np.load(file[0], allow_pickle=True))\n", " mel_length = melspec.shape[1]\n", " if mel_length == 0:\n", " print(\"|\".join(file), \"\\n[WARNING] has 0 duration.\")\n", " print(\"Checking Training Files\")\n", " audiopaths_and_text = load_filepaths_and_text(hparams.training_files) # get split lines from training_files text file.\n", " check_arr(audiopaths_and_text)\n", " print(\"Checking Validation Files\")\n", " audiopaths_and_text = load_filepaths_and_text(hparams.validation_files) # get split lines from validation_files text file.\n", " check_arr(audiopaths_and_text)\n", " print(\"Finished Checking\")\n", "\n", "warm_start=False #sorry about that\n", "n_gpus=1\n", "rank=0\n", "group_name=None\n", "\n", "# ---- 这是定义的默认参数,可以不用管 ----\n", "hparams = create_hparams()\n", "model_filename = 'current_model'\n", "hparams.training_files = \"filelists/clipper_train_filelist.txt\"\n", "hparams.validation_files = \"filelists/clipper_val_filelist.txt\"\n", "#hparams.use_mmi=True, # not used in this notebook\n", "#hparams.use_gaf=True, # not used in this notebook\n", "#hparams.max_gaf=0.5, # not used in this notebook\n", "#hparams.drop_frame_rate = 0.2 # not used in this notebook\n", "hparams.p_attention_dropout=0.1\n", "hparams.p_decoder_dropout=0.1\n", "hparams.decay_start = 15000\n", "hparams.A_ = 5e-4\n", "hparams.B_ = 8000\n", "hparams.C_ = 0\n", "hparams.min_learning_rate = 1e-5\n", "generate_mels = True\n", "hparams.show_alignments = True\n", "alignment_graph_height = 600\n", "alignment_graph_width = 1000\n", "hparams.batch_size = 32\n", "hparams.load_mel_from_disk = True\n", "hparams.ignore_layers = []\n", "hparams.epochs = 10000\n", "\n", "torch.backends.cudnn.enabled = hparams.cudnn_enabled\n", "torch.backends.cudnn.benchmark = hparams.cudnn_benchmark\n", "output_directory = '/content/drive/MyDrive/colab/outdir' # Location to save Checkpoints\n", "log_directory = '/content/tacotron2/logs' # Location to save Log files locally\n", "log_directory2 = '/content/drive/MyDrive/colab/logs' # Location to copy log files (done at the end of each epoch to cut down on I/O)e\n", "checkpoint_path = output_directory+(r'/')+model_filename\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-08-12T04:23:09.697053Z", "iopub.status.busy": "2022-08-12T04:23:09.696148Z", "iopub.status.idle": "2022-08-12T04:23:09.701991Z", "shell.execute_reply": "2022-08-12T04:23:09.700529Z", "shell.execute_reply.started": "2022-08-12T04:23:09.697015Z" }, "id": "GKRvQ1EWiVhn", "trusted": true }, "outputs": [], "source": [ "#@title 给你的模型取名(写字母数字)\n", "model_filename = \"test\" #@param {type:\"string\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-08-12T04:23:14.426817Z", "iopub.status.busy": "2022-08-12T04:23:14.426455Z", "iopub.status.idle": "2022-08-12T04:23:14.433170Z", "shell.execute_reply": "2022-08-12T04:23:14.432204Z", "shell.execute_reply.started": "2022-08-12T04:23:14.426786Z" }, "trusted": true, "cellView": "form", "id": "ew2HzQl2IZhH" }, "outputs": [], "source": [ "#@title 添加之前训练的模型到输出文件夹\n", "\n", "#@markdown 如果之前训练过**同名**模型: 在Google云端硬盘分享模型,设置为任何人可见,然后把share id放在这里即可(参考gdown用法)\n", "\n", "#@markdown 如果第一次训练 : 直接跳过\n", "os.chdir(\"outdir\")\n", "# ! gdown --id \n", "os.chdir(\"..\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-08-12T04:23:21.495804Z", "iopub.status.busy": "2022-08-12T04:23:21.494800Z", "iopub.status.idle": "2022-08-12T04:23:21.504471Z", "shell.execute_reply": "2022-08-12T04:23:21.503429Z", "shell.execute_reply.started": "2022-08-12T04:23:21.495754Z" }, "id": "JzevuoJnkIsi", "trusted": true }, "outputs": [], "source": [ "#@title 设置参数\n", "\n", "#@markdown **这两个参数是最重要的。**\n", "\n", "#@markdown 这个参数控制模型训练得多快。**不要设置太大,否则显卡会炸。**如果数据集比较大,设置在30左右比较好。\n", "\n", "#@markdown 如果数据集里音频文件的数量和这个参数差不多,训练会失败。\n", "\n", "hparams.batch_size = 8 #@param {type:\"integer\"}\n", "\n", "#@markdown 这个参数控制训练的次数\n", "hparams.epochs = 1000 #@param {type:\"integer\"}\n", "\n", "#The rest aren't that important\n", "hparams.p_attention_dropout=0.1\n", "hparams.p_decoder_dropout=0.1\n", "hparams.decay_start = 15000 # wait till decay_start to start decaying learning rate\n", "hparams.A_ = 5e-4 # Start/Max Learning Rate\n", "hparams.B_ = 8000 # Decay Rate\n", "hparams.C_ = 0 # Shift learning rate equation by this value\n", "hparams.min_learning_rate = 1e-5 # Min Learning Rate\n", "generate_mels = True # Don't change\n", "hparams.show_alignments = True\n", "alignment_graph_height = 600\n", "alignment_graph_width = 1000\n", "hparams.load_mel_from_disk = True\n", "hparams.ignore_layers = [] # Layers to reset (None by default, other than foreign languages this param can be ignored)\n", "\n", "torch.backends.cudnn.enabled = hparams.cudnn_enabled\n", "torch.backends.cudnn.benchmark = hparams.cudnn_benchmark\n", "output_directory = '/content/drive/MyDrive/colab/outdir' # Location to save Checkpoints\n", "log_directory = '/content/tacotron2/logs' # Location to save Log files locally\n", "log_directory2 = '/content/drive/MyDrive/colab/logs' # Location to copy log files (done at the end of each epoch to cut down on I/O)\n", "checkpoint_path = output_directory+(r'/')+model_filename" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-08-12T04:23:25.281471Z", "iopub.status.busy": "2022-08-12T04:23:25.280625Z", "iopub.status.idle": "2022-08-12T04:23:26.307536Z", "shell.execute_reply": "2022-08-12T04:23:26.305920Z", "shell.execute_reply.started": "2022-08-12T04:23:25.281432Z" }, "id": "yehA2fOliyUI", "trusted": true }, "outputs": [], "source": [ "#@title 数据集文件列表\n", "#@markdown 如果要求不高,两个列表用同一个文件即可\n", "\n", "#@markdown 训练集文件列表\n", "training_files_name = \"list.txt\" #@param {type:\"string\"}\n", "#@markdown 验证集文件列表\n", "validation_files_name = \"list.txt\" #@param {type:\"string\"}\n", "#@markdown 预处理文本的cleaner\n", "\n", "hparams_prefix = \"/content/tacotron2/filelists/\"\n", "text_cleaner='japanese_phrase_cleaners' #@param {type:\"string\"}\n", "text_cleaners=[text_cleaner]\n", "#@markdown ### 各种cleaner的效果示例\n", "#@markdown ### 1. 'japanese_cleaners'\n", "#@markdown #### 处理前\n", "#@markdown 何かあったらいつでも話して下さい。学院のことじゃなく、私事に関することでも何でも\n", "#@markdown #### 处理后\n", "#@markdown nanikaacltaraitsudemohanashItekudasai.gakuiNnokotojanaku,shijinikaNsurukotodemonanidemo.\n", "#@markdown ### 2. 'japanese_tokenization_cleaners'\n", "#@markdown #### 处理前\n", "#@markdown 何かあったらいつでも話して下さい。学院のことじゃなく、私事に関することでも何でも\n", "#@markdown #### 处理后\n", "#@markdown nani ka acl tara itsu demo hanashi te kudasai. gakuiN no koto ja naku, shiji nikaNsuru koto de mo naNdemo.\n", "#@markdown ### 3. 'japanese_accent_cleaners'\n", "#@markdown #### 处理前\n", "#@markdown 何かあったらいつでも話して下さい。学院のことじゃなく、私事に関することでも何でも\n", "#@markdown #### 处理后\n", "#@markdown :na)nika a)cltara i)tsudemo ha(na)shIte ku(dasa)i.:ga(kuiNno ko(to)janaku,:shi)jini ka(Nsu)ru ko(to)demo na)nidemo.\n", "#@markdown ### 4. 'japanese_phrase_cleaners'\n", "#@markdown #### 处理前\n", "#@markdown 何かあったらいつでも話して下さい。学院のことじゃなく、私事に関することでも何でも\n", "#@markdown #### 处理后\n", "#@markdown nanika acltara itsudemo hanashIte kudasai. gakuiNno kotojanaku, shijini kaNsuru kotodemo nanidemo.\n", "\n", "training_files = hparams_prefix + training_files_name\n", "validation_files = hparams_prefix + validation_files_name\n", "\n", "hparams.training_files = training_files\n", "hparams.validation_files = validation_files\n", "hparams.text_cleaners = text_cleaners" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-08-12T04:23:32.743603Z", "iopub.status.busy": "2022-08-12T04:23:32.743060Z", "iopub.status.idle": "2022-08-12T04:24:06.307565Z", "shell.execute_reply": "2022-08-12T04:24:06.306350Z", "shell.execute_reply.started": "2022-08-12T04:23:32.743559Z" }, "id": "b_xMcYMfkc9L", "trusted": true }, "outputs": [], "source": [ "#@title 生成MEL谱\n", "# ---- Replace .wav with .npy in filelists ----\n", "!sed -i -- 's,.wav|,.npy|,g' filelists/*.txt\n", "# ---- Replace .wav with .npy in filelists ----\n", "if generate_mels:\n", " create_mels()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-08-12T04:24:09.055998Z", "iopub.status.busy": "2022-08-12T04:24:09.055604Z", "iopub.status.idle": "2022-08-12T04:24:09.263224Z", "shell.execute_reply": "2022-08-12T04:24:09.261774Z", "shell.execute_reply.started": "2022-08-12T04:24:09.055966Z" }, "id": "oJXxqs6kkgLw", "trusted": true }, "outputs": [], "source": [ "#@title 检查数据集\n", "#@markdown 没有error就算成功\n", "check_dataset(hparams)" ] }, { "cell_type": "markdown", "metadata": { "id": "62-cfyIubje_" }, "source": [ "#训练" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": { "iopub.execute_input": "2022-08-12T04:25:58.143443Z", "iopub.status.busy": "2022-08-12T04:25:58.143025Z", "iopub.status.idle": "2022-08-12T04:26:55.002866Z", "shell.execute_reply": "2022-08-12T04:26:55.001342Z", "shell.execute_reply.started": "2022-08-12T04:25:58.143406Z" }, "id": "qJTrZhShk8ZR", "trusted": true }, "outputs": [], "source": [ "#@title 开始训练\n", "#@markdown Validation loss 越小,拟合效果可能越好\n", "print('FP16 Run:', hparams.fp16_run)\n", "print('Dynamic Loss Scaling:', hparams.dynamic_loss_scaling)\n", "print('Distributed Run:', hparams.distributed_run)\n", "print('cuDNN Enabled:', hparams.cudnn_enabled)\n", "print('cuDNN Benchmark:', hparams.cudnn_benchmark)\n", "train(output_directory, log_directory, checkpoint_path,\n", " warm_start, n_gpus, rank, group_name, hparams, log_directory2)" ] }, { "cell_type": "markdown", "metadata": { "id": "jDGVcS77b25R" }, "source": [ "#语音合成" ] }, { "cell_type": "markdown", "metadata": { "id": "V6pX7t0cVlj9" }, "source": [ "##用HiFi-GAN转换##" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mwsEA9fP4qfZ", "cellView": "form" }, "outputs": [], "source": [ "#@markdown 配置:\n", "\n", "#@markdown 重新运行即可应用配置的更改\n", "\n", "#国际 HiFi-GAN 模型(有点机器音): 1qpgI41wNXFcH-iKq1Y42JlBC9j0je8PW\n", "#@markdown 你训练好的tacotron2模型的路径填在`Tacotron2_Model`这里\n", "Tacotron2_Model = '/content/drive/MyDrive/YOURMODEL'#@param {type:\"string\"}\n", "TACOTRON2_ID = Tacotron2_Model\n", "HIFIGAN_ID = \"1qpgI41wNXFcH-iKq1Y42JlBC9j0je8PW\"\n", "#@markdown 选择预处理文本的cleaner\n", "text_cleaner = 'japanese_phrase_cleaners'#@param {type:\"string\"}\n", "\n", "# Check if Initilized\n", "try:\n", " initilized\n", "except NameError:\n", " print(\"Setting up, please wait.\\n\")\n", " !pip install tqdm -q\n", " from tqdm.notebook import tqdm\n", " with tqdm(total=5, leave=False) as pbar:\n", " %tensorflow_version 1.x\n", " import os\n", " from os.path import exists, join, basename, splitext\n", " !pip install gdown\n", " git_repo_url = 'https://github.com/CjangCjengh/tacotron2-japanese.git'\n", " project_name = splitext(basename(git_repo_url))[0]\n", " if not exists(project_name):\n", " # clone and install\n", " !git clone -q --recursive {git_repo_url}\n", " !git clone -q --recursive https://github.com/SortAnon/hifi-gan\n", " !pip install -q librosa unidecode\n", " pbar.update(1) # downloaded TT2 and HiFi-GAN\n", " import sys\n", " sys.path.append('hifi-gan')\n", " sys.path.append(project_name)\n", " import time\n", " import matplotlib\n", " import matplotlib.pylab as plt\n", " import gdown\n", " d = 'https://drive.google.com/uc?id='\n", "\n", " %matplotlib inline\n", " import IPython.display as ipd\n", " import numpy as np\n", " import torch\n", " import json\n", " from hparams import create_hparams\n", " from model import Tacotron2\n", " from layers import TacotronSTFT\n", " from audio_processing import griffin_lim\n", " from text import text_to_sequence\n", " from env import AttrDict\n", " from meldataset import MAX_WAV_VALUE\n", " from models import Generator\n", "\n", " pbar.update(1) # initialized Dependancies\n", "\n", " graph_width = 900\n", " graph_height = 360\n", " def plot_data(data, figsize=(int(graph_width/100), int(graph_height/100))):\n", " %matplotlib inline\n", " fig, axes = plt.subplots(1, len(data), figsize=figsize)\n", " for i in range(len(data)):\n", " axes[i].imshow(data[i], aspect='auto', origin='bottom', \n", " interpolation='none', cmap='inferno')\n", " fig.canvas.draw()\n", " plt.show()\n", "\n", " # Setup Pronounciation Dictionary\n", " !gdown --id '1E12g_sREdcH5vuZb44EZYX8JjGWQ9rRp'\n", " thisdict = {}\n", " for line in reversed((open('merged.dict.txt', \"r\").read()).splitlines()):\n", " thisdict[(line.split(\" \",1))[0]] = (line.split(\" \",1))[1].strip()\n", "\n", " pbar.update(1) # Downloaded and Set up Pronounciation Dictionary\n", "\n", " def ARPA(text, punctuation=r\"!?,.;\", EOS_Token=True):\n", " out = ''\n", " for word_ in text.split(\" \"):\n", " word=word_; end_chars = ''\n", " while any(elem in word for elem in punctuation) and len(word) > 1:\n", " if word[-1] in punctuation: end_chars = word[-1] + end_chars; word = word[:-1]\n", " else: break\n", " try:\n", " word_arpa = thisdict[word.upper()]\n", " word = \"{\" + str(word_arpa) + \"}\"\n", " except KeyError: pass\n", " out = (out + \" \" + word + end_chars).strip()\n", " if EOS_Token and out[-1] != \";\": out += \";\"\n", " return out\n", "\n", " def get_hifigan(MODEL_ID):\n", " # Download HiFi-GAN\n", " hifigan_pretrained_model = 'hifimodel'\n", " gdown.download(d+MODEL_ID, hifigan_pretrained_model, quiet=False)\n", " if not exists(hifigan_pretrained_model):\n", " raise Exception(\"HiFI-GAN model failed to download!\")\n", "\n", " # Load HiFi-GAN\n", " conf = os.path.join(\"hifi-gan\", \"config_v1.json\")\n", " with open(conf) as f:\n", " json_config = json.loads(f.read())\n", " h = AttrDict(json_config)\n", " torch.manual_seed(h.seed)\n", " hifigan = Generator(h).to(torch.device(\"cuda\"))\n", " state_dict_g = torch.load(hifigan_pretrained_model, map_location=torch.device(\"cuda\"))\n", " hifigan.load_state_dict(state_dict_g[\"generator\"])\n", " hifigan.eval()\n", " hifigan.remove_weight_norm()\n", " return hifigan, h\n", "\n", " hifigan, h = get_hifigan(HIFIGAN_ID)\n", " pbar.update(1) # Downloaded and Set up HiFi-GAN\n", "\n", " def has_MMI(STATE_DICT):\n", " return any(True for x in STATE_DICT.keys() if \"mi.\" in x)\n", "\n", " def get_Tactron2(MODEL_ID):\n", " # Download Tacotron2\n", " tacotron2_pretrained_model = TACOTRON2_ID\n", " if not exists(tacotron2_pretrained_model):\n", " raise Exception(\"Tacotron2 model failed to download!\")\n", " # Load Tacotron2 and Config\n", " hparams = create_hparams()\n", " hparams.sampling_rate = 22050\n", " hparams.max_decoder_steps = 3000 # Max Duration\n", " hparams.gate_threshold = 0.25 # Model must be 25% sure the clip is over before ending generation\n", " model = Tacotron2(hparams)\n", " state_dict = torch.load(tacotron2_pretrained_model)['state_dict']\n", " if has_MMI(state_dict):\n", " raise Exception(\"ERROR: This notebook does not currently support MMI models.\")\n", " model.load_state_dict(state_dict)\n", " _ = model.cuda().eval().half()\n", " return model, hparams\n", "\n", " model, hparams = get_Tactron2(TACOTRON2_ID)\n", " previous_tt2_id = TACOTRON2_ID\n", "\n", " pbar.update(1) # Downloaded and Set up Tacotron2\n", "\n", " # Extra Info\n", " def end_to_end_infer(text, pronounciation_dictionary, show_graphs):\n", " for i in [x for x in text.split(\"\\n\") if len(x)]:\n", " if not pronounciation_dictionary:\n", " if i[-1] != \";\": i=i+\";\" \n", " else: i = ARPA(i)\n", " with torch.no_grad(): # save VRAM by not including gradients\n", " sequence = np.array(text_to_sequence(i, [text_cleaner]))[None, :]\n", " sequence = torch.autograd.Variable(torch.from_numpy(sequence)).cuda().long()\n", " mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)\n", " if show_graphs:\n", " plot_data((mel_outputs_postnet.float().data.cpu().numpy()[0],\n", " alignments.float().data.cpu().numpy()[0].T))\n", " y_g_hat = hifigan(mel_outputs_postnet.float())\n", " audio = y_g_hat.squeeze()\n", " audio = audio * MAX_WAV_VALUE\n", " print(\"\")\n", " ipd.display(ipd.Audio(audio.cpu().numpy().astype(\"int16\"), rate=hparams.sampling_rate))\n", " from IPython.display import clear_output\n", " clear_output()\n", " initilized = \"Ready\"\n", "\n", "if previous_tt2_id != TACOTRON2_ID:\n", " print(\"Updating Models\")\n", " model, hparams = get_Tactron2(TACOTRON2_ID)\n", " hifigan, h = get_hifigan(HIFIGAN_ID)\n", " previous_tt2_id = TACOTRON2_ID\n", "\n", "pronounciation_dictionary = False #@param {type:\"boolean\"}\n", "# disables automatic ARPAbet conversion, useful for inputting your own ARPAbet pronounciations or just for testing\n", "show_graphs = True #@param {type:\"boolean\"}\n", "max_duration = 25 #this does nothing\n", "model.decoder.max_decoder_steps = 1000 #@param {type:\"integer\"}\n", "stop_threshold = 0.324 #@param {type:\"number\"}\n", "model.decoder.gate_threshold = stop_threshold\n", "\n", "#@markdown ---\n", "\n", "print(f\"Current Config:\\npronounciation_dictionary: {pronounciation_dictionary}\\nshow_graphs: {show_graphs}\\nmax_duration (in seconds): {max_duration}\\nstop_threshold: {stop_threshold}\\n\\n\")\n", "\n", "time.sleep(1)\n", "print(\"输入要转换成语音的文本.\")\n", "contents = []\n", "while True:\n", " try:\n", " print(\"-\"*50)\n", " line = input()\n", " if line == \"\":\n", " continue\n", " end_to_end_infer(line, pronounciation_dictionary, show_graphs)\n", " except EOFError:\n", " break\n", " except KeyboardInterrupt:\n", " print(\"程序终止...\")\n", " break" ] }, { "cell_type": "markdown", "metadata": { "id": "Uitul995V0Jw" }, "source": [ "##用 Waveglow##\n", "(个人不建议使用)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "EXR9p7AeSuL6" }, "outputs": [], "source": [ "#@title 安装 Tacotron 和 Waveglow\n", "!pip install -U tensorflow==1.15.2\n", "import os\n", "from os.path import exists, join, basename, splitext\n", "!pip install gdown\n", "git_repo_url = 'https://github.com/CjangCjengh/tacotron2-japanese.git'\n", "project_name = splitext(basename(git_repo_url))[0]\n", "if not exists(project_name):\n", " # clone and install\n", " !git clone -q --recursive {git_repo_url}\n", " !cd {project_name}/waveglow && git checkout 2fd4e63\n", " !pip install -q librosa unidecode\n", " \n", "import sys\n", "sys.path.append(join(project_name, 'waveglow/'))\n", "sys.path.append(project_name)\n", "import time\n", "import matplotlib\n", "import matplotlib.pylab as plt\n", "import gdown\n", "d = 'https://drive.google.com/uc?id='" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "liDdgta-SyPP" }, "outputs": [], "source": [ "#@title 加载预训练模型\n", "force_download_TT2 = True\n", "tacotron2_pretrained_model = '/PATH/Your Tactron2 Model'#@param {type:\"string\"}\n", "waveglow_pretrained_model = '/PATH/waveglow_256channels_ljs_v3.pt'#@param {type:\"string\"}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Q0BWBVdCS9ty", "cellView": "form" }, "outputs": [], "source": [ "#@title 安装 Tacotron 和 Waveglow \n", "%matplotlib inline\n", "import IPython.display as ipd\n", "import numpy as np\n", "import torch\n", "\n", "from hparams import create_hparams\n", "from model import Tacotron2\n", "from layers import TacotronSTFT\n", "from audio_processing import griffin_lim\n", "from text import text_to_sequence\n", "from denoiser import Denoiser\n", "\n", "graph_width = 900\n", "graph_height = 360\n", "def plot_data(data, figsize=(int(graph_width/100), int(graph_height/100))):\n", " %matplotlib inline\n", " fig, axes = plt.subplots(1, len(data), figsize=figsize)\n", " for i in range(len(data)):\n", " axes[i].imshow(data[i], aspect='auto', origin='bottom', \n", " interpolation='none', cmap='inferno')\n", " fig.canvas.draw()\n", " plt.show()\n", "\n", "!gdown --id '1E12g_sREdcH5vuZb44EZYX8JjGWQ9rRp'\n", "thisdict = {}\n", "for line in reversed((open('merged.dict.txt', \"r\").read()).splitlines()):\n", " thisdict[(line.split(\" \",1))[0]] = (line.split(\" \",1))[1].strip()\n", "def ARPA(text):\n", " out = ''\n", " for word_ in text.split(\" \"):\n", " word=word_; end_chars = ''\n", " while any(elem in word for elem in r\"!?,.;\") and len(word) > 1:\n", " if word[-1] == '!': end_chars = '!' + end_chars; word = word[:-1]\n", " if word[-1] == '?': end_chars = '?' + end_chars; word = word[:-1]\n", " if word[-1] == ',': end_chars = ',' + end_chars; word = word[:-1]\n", " if word[-1] == '.': end_chars = '.' + end_chars; word = word[:-1]\n", " if word[-1] == ';': end_chars = ';' + end_chars; word = word[:-1]\n", " else: break\n", " try: word_arpa = thisdict[word.upper()]\n", " except: word_arpa = ''\n", " if len(word_arpa)!=0: word = \"{\" + str(word_arpa) + \"}\"\n", " out = (out + \" \" + word + end_chars).strip()\n", " if out[-1] != \";\": out = out + \";\"\n", " return out\n", "\n", "#torch.set_grad_enabled(False)\n", "\n", "# initialize Tacotron2 with the pretrained model\n", "hparams = create_hparams()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "Sqe-jWarTCw6" }, "outputs": [], "source": [ "#@title 参数\n", "# Load Tacotron2 (run this cell every time you change the model)\n", "hparams.sampling_rate = 22050 # Don't change this\n", "hparams.max_decoder_steps = 1000 # How long the audio will be before it cuts off (1000 is about 11 seconds)\n", "hparams.gate_threshold = 0.1 # Model must be 90% sure the clip is over before ending generation (the higher this number is, the more likely that the AI will keep generating until it reaches the Max Decoder Steps)\n", "model = Tacotron2(hparams)\n", "model.load_state_dict(torch.load(tacotron2_pretrained_model)['state_dict'])\n", "_ = model.cuda().eval().half()\n", "\n", "# Load WaveGlow\n", "waveglow = torch.load(waveglow_pretrained_model)['model']\n", "waveglow.cuda().eval().half()\n", "for k in waveglow.convinv:\n", " k.float()\n", "denoiser = Denoiser(waveglow)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "FiRVI-J3TNnc" }, "outputs": [], "source": [ "#@title 开始合成!\n", "text = 'Your Text Here'#@param {type:\"string\"}\n", "sigma = 0.8\n", "denoise_strength = 0.324\n", "raw_input = True # disables automatic ARPAbet conversion, useful for inputting your own ARPAbet pronounciations or just for testing.\n", " # should be True if synthesizing a non-English language\n", "\n", "for i in text.split(\"\\n\"):\n", " if len(i) < 1: continue;\n", " print(i)\n", " if raw_input:\n", " if i[-1] != \";\": i=i+\";\" \n", " else: i = ARPA(i)\n", " print(i)\n", " with torch.no_grad(): # save VRAM by not including gradients\n", " sequence = np.array(text_to_sequence(i, ['english_cleaners']))[None, :]\n", " sequence = torch.autograd.Variable(torch.from_numpy(sequence)).cuda().long()\n", " mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)\n", " plot_data((mel_outputs_postnet.float().data.cpu().numpy()[0],\n", " alignments.float().data.cpu().numpy()[0].T))\n", " audio = waveglow.infer(mel_outputs_postnet, sigma=sigma); print(\"\"); ipd.display(ipd.Audio(audio[0].data.cpu().numpy(), rate=hparams.sampling_rate))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" }, "colab": { "name": "tacotron2训练和语音合成.ipynb", "provenance": [], "collapsed_sections": [] }, "accelerator": "GPU", "gpuClass": "standard" }, "nbformat": 4, "nbformat_minor": 0 }