Spaces:

AkitoP
/

GPT-SoVITS-V2-NIIMI_SORA

Running

File size: 15,755 Bytes
{
    "(1)MDX-Net(onnx_dereverb):对于双通道混响是最好的选择，不能去除单通道混响；": "(1)MDX-Net(onnx_dereverb): Best choice for dual-channel reverberation, cannot remove single-channel reverberation;",
    "(234)DeEcho:去除延迟效果。Aggressive比Normal去除得更彻底，DeReverb额外去除混响，可去除单声道混响，但是对高频重的板式混响去不干净。": "(234)DeEcho: Removes delay effects. Aggressive mode removes more thoroughly than Normal mode. DeReverb additionally removes reverberation, can remove mono reverberation, but does not clean heavily high-frequency plate reverberation.",
    "*GPT模型列表": "*GPT models list",
    "*SoVITS模型列表": "*SoVITS models list",
    "*实验/模型名": "*Experiment/model name",
    "*文本标注文件": "*Text labelling file",
    "*训练集音频文件目录": "*Audio dataset folder",
    "*请上传并填写参考信息": "*Please upload and fill reference information",
    "*请填写需要合成的目标文本和语种模式": "*Please fill in the target text and language mode for synthesis",
    ".list标注文件的路径": ".list annotation file path",
    "0-前置数据集获取工具": "0-Fetch dataset",
    "0a-UVR5人声伴奏分离&去混响去延迟工具": "0a-UVR5 webui (for vocal separation, deecho, dereverb and denoise)",
    "0b-语音切分工具": "0b-Audio slicer",
    "0bb-语音降噪工具": "0bb-Voice denoiser",
    "0c-中文批量离线ASR工具": "0c-Chinese ASR tool",
    "0d-语音文本校对标注工具": "0d-Speech to text proofreading tool",
    "1-GPT-SoVITS-TTS": "1-GPT-SOVITS-TTS",
    "1A-训练集格式化工具": "1A-Dataset formatting",
    "1Aa-文本内容": "1Aa-Text",
    "1Aabc-训练集格式化一键三连": "1Aabc-One-click formatting",
    "1Ab-SSL自监督特征提取": "1Ab-SSL self-supervised feature extraction",
    "1Ac-语义token提取": "1Ac-semantics token extraction",
    "1B-微调训练": "1B-Fine-tuned training",
    "1Ba-SoVITS训练。用于分享的模型文件输出在SoVITS_weights下。": "1Ba-SoVITS training. The model is located in SoVITS_weights.",
    "1Bb-GPT训练。用于分享的模型文件输出在GPT_weights下。": "1Bb-GPT training. The model is located in GPT_weights.",
    "1C-推理": "1C-inference",
    "1、DeEcho-DeReverb模型的耗时是另外2个DeEcho模型的接近2倍；": "1. The DeEcho-DeReverb model's processing time is nearly twice that of the other two DeEcho models.",
    "1、保留人声：不带和声的音频选这个，对主人声保留比HP5更好。内置HP2和HP3两个模型，HP3可能轻微漏伴奏但对主人声保留比HP2稍微好一丁点；": "1. Preserve Vocals: Choose this option for audio without harmonies, as it better retains the main vocal compared to the HP5 model. This option includes two built-in models, HP2 and HP3. HP3 may slightly let through some accompaniment but retains the main vocal slightly better than HP2.",
    "2-GPT-SoVITS-变声": "2-GPT-SoVITS-Voice Changer",
    "2、MDX-Net-Dereverb模型挺慢的；": "2、MDX-Net-Dereverb Model is slow;",
    "2、仅保留主人声：带和声的音频选这个，对主人声可能有削弱。内置HP5一个模型；": "2. Keep Only Main Vocal: Choose this option for audio with harmonies, as it may slightly reduce the main vocal. Includes one built-in HP5 model;",
    "3、个人推荐的最干净的配置是先MDX-Net再DeEcho-Aggressive。": "3. Personal Recommendation for the cleanest configuration: First use MDX-Net followed by DeEcho-Aggressive",
    "3、去混响、去延迟模型（by FoxJoy）：": "3. Reverberation and delay removal model(by FoxJoy):",
    "ASR 模型": "ASR model",
    "ASR 模型尺寸": "ASR model size",
    "数据类型精度": "Computing precision",
    "ASR 语言设置": "ASR language",
    "ASR进程输出信息": "ASR output log",
    "GPT模型列表": "GPT weight list",
    "GPT训练进程输出信息": "GPT training output log",
    "GPU卡号,只能填1个整数": "GPU number, can only input ONE integer",
    "GPU卡号以-分割，每个卡号一个进程": "GPU number is separated by -, each GPU will run one process ",
    "SSL进程输出信息": "SSL output log",
    "SoVITS模型列表": "SoVITS weight list",
    "SoVITS训练进程输出信息": "SoVITS training output log",
    "TTS推理WebUI进程输出信息": "TTS inference webui output log",
    "TTS推理进程已关闭": "TTS inference process closed",
    "TTS推理进程已开启": "TTS inference process is opened",
    "UVR5已关闭": "UVR5 closed",
    "UVR5已开启": "UVR5 opened ",
    "UVR5进程输出信息": "UVR5 process output log",
    "alpha_mix:混多少比例归一化后音频进来": "alpha_mix: proportion of normalized audio merged into dataset",
    "gpt采样参数(无参考文本时不要太低。不懂就用默认)：": "GPT sampling parameters (not too low when there's no reference text. Use default if unsure):",
    "hop_size:怎么算音量曲线，越小精度越大计算量越高（不是精度越大效果越好）": "hop_size: FO hop size, the smaller the value, the higher the accuracy）",
    "max:归一化后最大值多少": "Loudness multiplier after normalized",
    "max_sil_kept:切完后静音最多留多长": "Maximum length for silence to be kept",
    "min_interval:最短切割间隔": "Minumum interval for audio cutting",
    "min_length:每段最小多长，如果第一段太短一直和后面段连起来直到超过这个值": "min_length: the minimum length of each segment. If the first segment is too short, it will be concatenated with the next segment until it exceeds this value",
    "temperature": "temperature",
    "threshold:音量小于这个值视作静音的备选切割点": "Noise gate threshold (loudness below this value will be treated as noise",
    "top_k": "top_k",
    "top_p": "top_p",
    "一键三连进程输出信息": "One-click formatting output",
    "不切": "No slice",
    "中文": "Chinese",
    "中文教程文档：https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e": "Chinese Tutorial：https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e",
    "中英混合": "Chinese-English Mixed",
    "也可批量输入音频文件, 二选一, 优先读文件夹": "Multiple audio files can also be imported. If a folder path exists, this input is ignored.",
    "人声伴奏分离批量处理， 使用UVR5模型。": "Batch processing for vocal and instrumental separation, using the UVR5 model.",
    "人声提取激进程度": "Vocal extraction aggressiveness",
    "伴奏人声分离&去混响&去回声": "Vocals/Accompaniment Separation & Reverberation Removal",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开，开启后无视填写的参考文本。": "When using the no-reference text mode, it is recommended to use a fine-tuned GPT. If the reference audio is unclear and you don't know what to write, you can enable this feature, which will ignore the reference text you've entered.",
    "保存频率save_every_epoch": "Save frequency (save_every_epoch):",
    "凑50字一切": "Cut per 50 characters",
    "凑四句一切": "Slice once every 4 sentences",
    "切分后文本": "Text after sliced",
    "切分后的子音频的输出根目录": "Audio slicer output folder",
    "切割使用的进程数": "CPU threads used for audio slicing",
    "刷新模型路径": "refreshing model paths",
    "前端处理后的文本(每句):": "Processed text from the frontend (per sentence):",
    "去混响/去延迟，附：": "Dereverberation/Delay Removal, including:",
    "参考音频在3~10秒范围外，请更换！": "Reference audio is outside the 3-10 second range, please choose another one!",
    "参考音频的文本": "Text for reference audio",
    "参考音频的语种": "Language for reference audio",
    "合成语音": "Start inference",
    "合格的文件夹路径格式举例： E:\\codes\\py39\\vits_vc_gpu\\白鹭霜华测试样例(去文件管理器地址栏拷就行了)。": "An example of a valid folder path format: E:\\codes\\py39\\vits_vc_gpu\\白鹭霜华测试样例 (simply copy the address from the file manager's address bar).",
    "后续将支持转音素、手工修改音素、语音合成分步执行。": " Step-to-step phoneme transformation and modification coming soon!",
    "填切割后音频所在目录！读取的音频文件完整路径=该目录-拼接-list文件里波形对应的文件名（不是全路径）。如果留空则使用.list文件里的绝对全路径。": "Please fill in the segmented audio files' directory! The full path of the audio file = the directory concatenated with the filename corresponding to the waveform in the list file (not the full path). If left blank, the absolute full path in the .list file will be used.",
    "多语种混合": "Multilingual Mixed",
    "实际输入的参考文本:": "Actual Input Reference Text:",
    "实际输入的目标文本(切句后):": "Actual Input Target Text (after sentence segmentation):",
    "实际输入的目标文本(每句):": "Actual Input Target Text (per sentence):",
    "实际输入的目标文本:": "Actual Input Target Text:",
    "导出文件格式": "Export file format",
    "开启GPT训练": "Start GPT training",
    "开启SSL提取": "Start SSL extracting",
    "开启SoVITS训练": "Start SoVITS training",
    "开启一键三连": "Start one-click formatting",
    "开启文本获取": "Start speech-to-text",
    "开启无参考文本模式。不填参考文本亦相当于开启。": "Enable no reference mode. If you don't fill 'Text for reference audio', no reference mode will be enabled.",
    "开启离线批量ASR": "Start batch ASR",
    "开启语义token提取": "Start semantics token extraction",
    "开启语音切割": "Start audio slicer",
    "开启语音降噪": "Start voice denoiser",
    "怎么切": "How to slice the sentence",
    "总训练轮数total_epoch": "Total training epochs (total_epoch):",
    "总训练轮数total_epoch，不建议太高": "Total epochs, do not increase to a value that is too high",
    "打标工具WebUI已关闭": "proofreading tool webui is closed",
    "打标工具WebUI已开启": "proofreading tool webui is opened",
    "打标工具进程输出信息": "Proofreading tool output log",
    "指定输出主人声文件夹": "Specify the output folder for vocals:",
    "指定输出非主人声文件夹": "Specify the output folder for accompaniment:",
    "按中文句号。切": "Slice by Chinese punct",
    "按标点符号切": "Slice by every punct",
    "按英文句号.切": "Slice by English punct",
    "文本切分工具。太长的文本合成出来效果不一定好，所以太长建议先切。合成会根据文本的换行分开合成再拼起来。": "Text slicer tool, since there will be issues when infering long texts, so it is advised to cut first. When infering, it will infer respectively then combined together.",
    "文本模块学习率权重": "Text model learning rate weighting",
    "文本进程输出信息": "Text processing output",
    "施工中，请静候佳音": "In construction, please wait",
    "日文": "Japanese",
    "日英混合": "Japanese-English Mixed",
    "是否仅保存最新的ckpt文件以节省硬盘空间": "Save only the latest '.ckpt' file to save disk space:",
    "是否在每次保存时间点将最终小模型保存至weights文件夹": "Save a small final model to the 'weights' folder at each save point:",
    "是否开启TTS推理WebUI": "Open TTS inference WEBUI",
    "是否开启UVR5-WebUI": "Open UVR5-WebUI",
    "是否开启dpo训练选项(实验性)": "Enable DPO training (experimental feature)",
    "是否开启打标WebUI": "Open labelling WebUI",
    "是否直接对上次合成结果调整语速。防止随机性。": "Whether to directly adjust the speech rate of the last synthesis result to prevent randomness.",
    "显卡信息": "GPU Information",
    "本软件以MIT协议开源, 作者不对软件具备任何控制力, 使用软件者、传播软件导出的声音者自负全责. <br>如不认可该条款, 则不能使用或引用软件包内任何代码和文件. 详见根目录<b>LICENSE</b>.": "This software is open source under the MIT license. The author does not have any control over the software. Users who use the software and distribute the sounds exported by the software are solely responsible. <br>If you do not agree with this clause, you cannot use or reference any codes and files within the software package. See the root directory <b>Agreement-LICENSE</b> for details.",
    "模型": "Model",
    "模型分为三类：": "Models are categorized into three types:",
    "模型切换": "Model switch",
    "每张显卡的batch_size": "Batch size per GPU:",
    "终止ASR进程": "Stop ASR task",
    "终止GPT训练": "Stop GPT training",
    "终止SSL提取进程": "Stop SSL extraction",
    "终止SoVITS训练": "Stop SoVITS training",
    "终止一键三连": "Stop one-click formatting",
    "终止文本获取进程": "Stop speech-to-text",
    "终止语义token提取进程": "Stop semantics token extraction",
    "终止语音切割": "Stop audio cutting",
    "终止语音降噪进程": "Stop voice denoising",
    "英文": "English",
    "语义token提取进程输出信息": "Sematics token extraction output log",
    "语速": "Speech rate",
    "语速调整，高为更快": "Adjust speech rate, higher for faster",
    "语音切割进程输出信息": "Audio slicer output log",
    "语音降噪进程输出信息": "Voice Denoiser Process Output Information",
    "请上传3~10秒内参考音频，超过会报错！": "Please upload a reference audio within the 3-10 second range; if it exceeds this duration, it will raise errors.",
    "请输入有效文本": "Please enter valid text.",
    "转换": "Convert",
    "输入待处理音频文件夹路径": "Enter the path of the audio folder to be processed:",
    "输入文件夹路径": "Input folder path",
    "输出logs/实验名目录下应有23456开头的文件和文件夹": "output folder (logs/{experiment name}) should have files and folders starts with 23456.",
    "输出信息": "Output information",
    "输出文件夹路径": "Output folder path",
    "输出的语音": "Inference Result",
    "选择训练完存放在SoVITS_weights和GPT_weights下的模型。默认的一个是底模，体验5秒Zero Shot TTS用。": "Choose the models from SoVITS_weights and GPT_weights. The default one is a pretrain, so you can experience zero shot TTS.",
    "降噪结果输出文件夹": "Denoised Results Output Folder",
    "降噪音频文件输入文件夹": "Denoising Audio File Input Folder",
    "需要合成的切分前文本": "Inference text that needs to be sliced",
    "需要合成的文本": "Inference text",
    "需要合成的语种": "Inference text language",
    "音频自动切分输入路径，可文件可文件夹": "Audio slicer input (file or folder)",
    "预训练的GPT模型路径": "Pretrained GPT model path",
    "预训练的SSL模型路径": "Pretrained SSL model path",
    "预训练的SoVITS-D模型路径": "Pretrained SoVITS-D model path",
    "预训练的SoVITS-G模型路径": "Pretrained SoVITS-G model path",
    "预训练的中文BERT模型路径": " Pretrained BERT model path"
}