First model version

Browse files

Files changed (12) hide show

.gitattributes +33 -0
.gitignore +151 -0
README.md +51 -0
checkpoints/FastDiff/config.yaml +149 -0
checkpoints/FastDiff/model_ckpt_steps_500000.ckpt +3 -0
checkpoints/ProDiff/config.yaml +205 -0
checkpoints/ProDiff/model_ckpt_steps_200000.ckpt +3 -0
checkpoints/ProDiff_Teacher/config.yaml +205 -0
checkpoints/ProDiff_Teacher/model_ckpt_steps_188000.ckpt +3 -0
data/binary/LJSpeech/phone_set.json +1 -0
data/binary/LJSpeech/spk_map.json +1 -0
data/binary/LJSpeech/train_f0s_mean_std.npy +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,33 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,151 @@

+### Project ignore
+/ParallelWaveGAN
+/wavegan_pretrained*
+/pretrained_models
+rsync
+.idea
+.DS_Store
+bak
+tmp
+*.tar.gz
+# mfa and kaldi
+kaldi_align/exp
+mfa
+montreal-forced-aligner
+mos
+nbs
+/configs_usr/*
+!/configs_usr/.gitkeep
+/fast_transformers
+/rnnoise
+/usr/*
+!/usr/.gitkeep
+# Created by .ignore support plugin (hsz.mobi)
+### Python template
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# celery beat schedule file
+celerybeat-schedule
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+将删除 datasets/remi/test/

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+license: other
+tags:
+- text-to-speech
+- neural-vocoder
+inference: false
+extra_gated_prompt: |-
+  One more step before getting this model.
+  This model is open access and available to all, with a license further specifying rights and usage.
+  Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.
+  By clicking on "Access repository" below, you accept that your *contact information* (email address and username) can be shared with the model authors as well.
+extra_gated_fields:
+ I have read the License and agree with its terms: checkbox
+---
+# ProDiff and FastDiff Model Card
+## Key Features
+  - **Extremely-Fast** diffusion text-to-speech synthesis pipeline for potential **industrial deployment**.
+  - **Tutorial and code base** for speech diffusion models.
+  - More **supported diffusion mechanism** (e.g., guided diffusion) will be available.
+## Model Details
+- **Developed by:** Robin Rombach, Patrick Esser
+- **Model type:** Diffusion-based text-to-speech generation model
+- **Language(s):** English
+- **License:**
+- **Model Description:** A conditional diffusion probabilistic model capable of generating high fidelity speech efficiently.
+- **Resources for more information:** [FastDiff GitHub Repository](https://github.com/Rongjiehuang/FastDiff), [FastDiff Paper](https://arxiv.org/abs/2204.09934).  [ProDiff GitHub Repository](https://github.com/Rongjiehuang/ProDiff), [ProDiff Paper](https://arxiv.org/abs/2207.06389).
+- **Cite as:**
+      @inproceedings{huang2022prodiff,
+         title={ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech},
+         author={Huang, Rongjie and Zhao, Zhou and Liu, Huadai and Liu, Jinglin and Cui, Chenye and Ren, Yi},
+         booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
+         year={2022}
+      @inproceedings{huang2022fastdiff,
+         title={FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis},
+         author={Huang, Rongjie and Lam, Max WY and Wang, Jun and Su, Dan and Yu, Dong and Ren, Yi and Zhao, Zhou},
+         booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}},
+         year={2022}
+-
+*This model card was written based on the [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).*

checkpoints/FastDiff/config.yaml ADDED Viewed

	@@ -0,0 +1,149 @@

+N: ''
+T: 1000
+accumulate_grad_batches: 1
+amp: false
+audio_channels: 1
+audio_num_mel_bins: 80
+audio_sample_rate: 22050
+aux_context_window: 0
+beta_0: 1.0e-06
+beta_T: 0.01
+binarization_args:
+  reset_phone_dict: true
+  reset_word_dict: true
+  shuffle: false
+  trim_eos_bos: false
+  with_align: false
+  with_f0: false
+  with_f0cwt: false
+  with_linear: false
+  with_spk_embed: false
+  with_spk_id: true
+  with_txt: false
+  with_wav: true
+  with_word: false
+binarizer_cls: data_gen.tts.vocoder_binarizer.VocoderBinarizer
+binary_data_dir: data/binary/LJSpeech
+check_val_every_n_epoch: 10
+clip_grad_norm: 1
+clip_grad_value: 0
+cond_channels: 80
+debug: false
+dec_ffn_kernel_size: 9
+dec_layers: 4
+dict_dir: ''
+diffusion_step_embed_dim_in: 128
+diffusion_step_embed_dim_mid: 512
+diffusion_step_embed_dim_out: 512
+disc_start_steps: 40000
+discriminator_grad_norm: 1
+dropout: 0.0
+ds_workers: 1
+enc_ffn_kernel_size: 9
+enc_layers: 4
+endless_ds: true
+eval_max_batches: -1
+ffn_act: gelu
+ffn_padding: SAME
+fft_size: 1024
+fmax: 7600
+fmin: 80
+frames_multiple: 1
+gen_dir_name: ''
+generator_grad_norm: 10
+griffin_lim_iters: 60
+hidden_size: 256
+hop_size: 256
+infer: false
+inner_channels: 32
+kpnet_conv_size: 3
+kpnet_hidden_channels: 64
+load_ckpt: ''
+loud_norm: false
+lr: 2e-4
+lvc_kernel_size: 3
+lvc_layers_each_block: 4
+max_epochs: 1000
+max_frames: 1548
+max_input_tokens: 1550
+max_samples: 25600
+max_sentences: 20
+max_tokens: 30000
+max_updates: 1000000
+max_valid_sentences: 1
+max_valid_tokens: 60000
+mel_loss: l1
+mel_vmax: 1.5
+mel_vmin: -6
+mfa_version: 2
+min_frames: 0
+min_level_db: -100
+noise_schedule: ''
+num_ckpt_keep: 3
+num_heads: 2
+num_mels: 80
+num_sanity_val_steps: -1
+num_spk: 400
+num_test_samples: 0
+num_valid_plots: 10
+optimizer_adam_beta1: 0.9
+optimizer_adam_beta2: 0.98
+out_wav_norm: false
+pitch_extractor: parselmouth
+pre_align_args:
+  allow_no_txt: false
+  denoise: false
+  nsample_per_mfa_group: 1000
+  sox_resample: false
+  sox_to_wav: false
+  trim_sil: false
+  txt_processor: en
+  use_tone: true
+pre_align_cls: egs.datasets.audio.pre_align.PreAlign
+print_nan_grads: false
+processed_data_dir: data/processed/LJSpeech
+profile_infer: false
+raw_data_dir: data/raw/LJSpeech-1.1
+ref_level_db: 20
+rename_tmux: true
+resume_from_checkpoint: 0
+save_best: true
+save_codes: []
+save_f0: false
+save_gt: true
+scheduler: rsqrt
+seed: 1234
+sort_by_len: true
+task_cls: modules.FastDiff.task.FastDiff.FastDiffTask
+tb_log_interval: 100
+test_ids: []
+test_input_dir: ''
+test_mel_dir: ''
+test_num: 100
+test_set_name: test
+train_set_name: train
+train_sets: ''
+upsample_ratios:
+- 8
+- 8
+- 4
+use_pitch_embed: false
+use_spk_embed: false
+use_spk_id: false
+use_split_spk_id: false
+use_wav: true
+use_weight_norm: true
+use_word_input: false
+val_check_interval: 2000
+valid_infer_interval: 10000
+valid_monitor_key: val_loss
+valid_monitor_mode: min
+valid_set_name: valid
+vocoder_denoise_c: 0.0
+warmup_updates: 8000
+weight_decay: 0
+win_length: null
+win_size: 1024
+window: hann
+word_size: 30000
+work_dir: checkpoints/FastDiff

checkpoints/FastDiff/model_ckpt_steps_500000.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee7b6022e525c71a6025b41eeeafff9d6186b52cba76b580d6986bc8674902f3
+size 183951271

checkpoints/ProDiff/config.yaml ADDED Viewed

	@@ -0,0 +1,205 @@

+accumulate_grad_batches: 1
+amp: false
+audio_num_mel_bins: 80
+audio_sample_rate: 22050
+base_config:
+- ./base.yaml
+binarization_args:
+  reset_phone_dict: true
+  reset_word_dict: true
+  shuffle: false
+  trim_eos_bos: false
+  trim_sil: false
+  with_align: true
+  with_f0: true
+  with_f0cwt: false
+  with_linear: false
+  with_spk_embed: false
+  with_spk_id: true
+  with_txt: true
+  with_wav: false
+  with_word: true
+binarizer_cls: data_gen.tts.base_binarizer.BaseBinarizer
+binary_data_dir: data/binary/LJSpeech
+check_val_every_n_epoch: 10
+clip_grad_norm: 1
+clip_grad_value: 0
+conv_use_pos: false
+cwt_add_f0_loss: false
+cwt_hidden_size: 128
+cwt_layers: 2
+cwt_loss: l1
+cwt_std_scale: 0.8
+debug: false
+dec_dilations:
+- 1
+- 1
+- 1
+- 1
+dec_ffn_kernel_size: 9
+dec_inp_add_noise: false
+dec_kernel_size: 5
+dec_layers: 4
+dec_num_heads: 2
+decoder_rnn_dim: 0
+decoder_type: fft
+dict_dir: ''
+diff_decoder_type: wavenet
+diff_loss_type: l1
+dilation_cycle_length: 1
+dropout: 0.1
+ds_workers: 2
+dur_enc_hidden_stride_kernel:
+- 0,2,3
+- 0,2,3
+- 0,1,3
+dur_loss: mse
+dur_predictor_kernel: 3
+dur_predictor_layers: 2
+enc_dec_norm: ln
+enc_dilations:
+- 1
+- 1
+- 1
+- 1
+enc_ffn_kernel_size: 9
+enc_kernel_size: 5
+enc_layers: 4
+encoder_K: 8
+encoder_type: fft
+endless_ds: true
+ffn_act: gelu
+ffn_hidden_size: 1024
+ffn_padding: SAME
+fft_size: 1024
+fmax: 7600
+fmin: 80
+frames_multiple: 1
+gen_dir_name: ''
+gen_tgt_spk_id: -1
+griffin_lim_iters: 60
+hidden_size: 256
+hop_size: 256
+infer: false
+keep_bins: 80
+lambda_commit: 0.25
+lambda_energy: 0.1
+lambda_f0: 1.0
+lambda_ph_dur: 0.1
+lambda_sent_dur: 1.0
+lambda_uv: 1.0
+lambda_word_dur: 1.0
+layers_in_block: 2
+load_ckpt: ''
+loud_norm: false
+lr: 1.0
+max_beta: 0.06
+max_epochs: 1000
+max_frames: 1548
+max_input_tokens: 1550
+max_sentences: 48
+max_tokens: 32000
+max_updates: 200000
+max_valid_sentences: 1
+max_valid_tokens: 60000
+mel_loss: ssim:0.5|l1:0.5
+mel_vmax: 1.5
+mel_vmin: -6
+min_frames: 0
+min_level_db: -100
+num_ckpt_keep: 3
+num_heads: 2
+num_sanity_val_steps: -1
+num_spk: 1
+num_test_samples: 0
+num_valid_plots: 10
+optimizer_adam_beta1: 0.9
+optimizer_adam_beta2: 0.98
+out_wav_norm: false
+pitch_ar: false
+pitch_embed_type: 0
+pitch_enc_hidden_stride_kernel:
+- 0,2,5
+- 0,2,5
+- 0,2,5
+pitch_extractor: parselmouth
+pitch_loss: l1
+pitch_norm: standard
+pitch_ssim_win: 11
+pitch_type: frame
+pre_align_args:
+  allow_no_txt: false
+  denoise: false
+  sox_resample: false
+  sox_to_wav: false
+  trim_sil: false
+  txt_processor: en
+  use_tone: true
+pre_align_cls: ''
+predictor_dropout: 0.5
+predictor_grad: 0.1
+predictor_hidden: -1
+predictor_kernel: 5
+predictor_layers: 2
+pretrain_fs_ckpt: ''
+print_nan_grads: false
+processed_data_dir: data/processed/LJSpeech
+profile_infer: false
+raw_data_dir: data/raw/LJSpeech
+ref_hidden_stride_kernel:
+- 0,3,5
+- 0,3,5
+- 0,2,5
+- 0,2,5
+- 0,2,5
+ref_level_db: 20
+ref_norm_layer: bn
+rename_tmux: true
+residual_channels: 256
+residual_layers: 20
+resume_from_checkpoint: 0
+save_best: true
+save_codes: []
+save_f0: false
+save_gt: true
+schedule_type: vpsde
+scheduler: rsqrt
+seed: 1234
+sil_add_noise: false
+sort_by_len: true
+spec_max: []
+spec_min: []
+task_cls: modules.ProDiff.task.ProDiff_task.ProDiff_Task
+tb_log_interval: 100
+teacher_ckpt: checkpoints/ProDiff_Teacher/model_ckpt_steps_188000.ckpt
+test_ids: []
+test_input_dir: ''
+test_num: 100
+test_set_name: test
+timesteps: 4
+train_set_name: train
+train_sets: ''
+use_cond_disc: true
+use_energy_embed: true
+use_gt_dur: true
+use_gt_f0: true
+use_pitch_embed: true
+use_pos_embed: true
+use_ref_enc: false
+use_spk_embed: false
+use_spk_id: false
+use_split_spk_id: false
+use_uv: true
+use_var_enc: false
+val_check_interval: 2000
+valid_infer_interval: 10000
+valid_monitor_key: val_loss
+valid_monitor_mode: min
+valid_set_name: valid
+var_enc_vq_codes: 64
+vocoder_denoise_c: 0.0
+warmup_updates: 2000
+weight_decay: 0
+win_size: 1024
+word_size: 30000
+work_dir: checkpoints/ProDiff

checkpoints/ProDiff/model_ckpt_steps_200000.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8cc8aad355c297b010e2c362341f736b3477744af76e02f6c9965409a7e9113a
+size 349055740

checkpoints/ProDiff_Teacher/config.yaml ADDED Viewed

	@@ -0,0 +1,205 @@

+accumulate_grad_batches: 1
+amp: false
+audio_num_mel_bins: 80
+audio_sample_rate: 22050
+base_config:
+- ./base.yaml
+binarization_args:
+  reset_phone_dict: true
+  reset_word_dict: true
+  shuffle: false
+  trim_eos_bos: false
+  trim_sil: false
+  with_align: true
+  with_f0: true
+  with_f0cwt: false
+  with_linear: false
+  with_spk_embed: false
+  with_spk_id: true
+  with_txt: true
+  with_wav: false
+  with_word: true
+binarizer_cls: data_gen.tts.base_binarizer.BaseBinarizer
+binary_data_dir: data/binary/LJSpeech
+check_val_every_n_epoch: 10
+clip_grad_norm: 1
+clip_grad_value: 0
+conv_use_pos: false
+cwt_add_f0_loss: false
+cwt_hidden_size: 128
+cwt_layers: 2
+cwt_loss: l1
+cwt_std_scale: 0.8
+debug: false
+dec_dilations:
+- 1
+- 1
+- 1
+- 1
+dec_ffn_kernel_size: 9
+dec_inp_add_noise: false
+dec_kernel_size: 5
+dec_layers: 4
+dec_num_heads: 2
+decoder_rnn_dim: 0
+decoder_type: fft
+dict_dir: ''
+diff_decoder_type: wavenet
+diff_loss_type: l1
+dilation_cycle_length: 1
+dropout: 0.1
+ds_workers: 2
+dur_enc_hidden_stride_kernel:
+- 0,2,3
+- 0,2,3
+- 0,1,3
+dur_loss: mse
+dur_predictor_kernel: 3
+dur_predictor_layers: 2
+enc_dec_norm: ln
+enc_dilations:
+- 1
+- 1
+- 1
+- 1
+enc_ffn_kernel_size: 9
+enc_kernel_size: 5
+enc_layers: 4
+encoder_K: 8
+encoder_type: fft
+endless_ds: true
+ffn_act: gelu
+ffn_hidden_size: 1024
+ffn_padding: SAME
+fft_size: 1024
+fmax: 7600
+fmin: 80
+frames_multiple: 1
+gen_dir_name: ''
+gen_tgt_spk_id: -1
+griffin_lim_iters: 60
+hidden_size: 256
+hop_size: 256
+infer: false
+keep_bins: 80
+lambda_commit: 0.25
+lambda_energy: 0.1
+lambda_f0: 1.0
+lambda_ph_dur: 0.1
+lambda_sent_dur: 1.0
+lambda_uv: 1.0
+lambda_word_dur: 1.0
+layers_in_block: 2
+load_ckpt: ''
+loud_norm: false
+lr: 1.0
+max_beta: 0.06
+max_epochs: 1000
+max_frames: 1548
+max_input_tokens: 1550
+max_sentences: 48
+max_tokens: 32000
+max_updates: 200000
+max_valid_sentences: 1
+max_valid_tokens: 60000
+mel_loss: ssim:0.5|l1:0.5
+mel_vmax: 1.5
+mel_vmin: -6
+min_frames: 0
+min_level_db: -100
+num_ckpt_keep: 3
+num_heads: 2
+num_sanity_val_steps: -1
+num_spk: 1
+num_test_samples: 20
+num_valid_plots: 10
+optimizer_adam_beta1: 0.9
+optimizer_adam_beta2: 0.98
+out_wav_norm: false
+pitch_ar: false
+pitch_embed_type: 0
+pitch_enc_hidden_stride_kernel:
+- 0,2,5
+- 0,2,5
+- 0,2,5
+pitch_extractor: parselmouth
+pitch_loss: l1
+pitch_norm: standard
+pitch_ssim_win: 11
+pitch_type: frame
+pre_align_args:
+  allow_no_txt: false
+  denoise: false
+  sox_resample: false
+  sox_to_wav: false
+  trim_sil: false
+  txt_processor: en
+  use_tone: true
+pre_align_cls: egs.datasets.audio.lj.pre_align.LJPreAlign
+predictor_dropout: 0.5
+predictor_grad: 0.1
+predictor_hidden: -1
+predictor_kernel: 5
+predictor_layers: 2
+pretrain_fs_ckpt: ''
+print_nan_grads: false
+processed_data_dir: data/processed/LJSpeech
+profile_infer: false
+raw_data_dir: data/raw/LJSpeech
+ref_hidden_stride_kernel:
+- 0,3,5
+- 0,3,5
+- 0,2,5
+- 0,2,5
+- 0,2,5
+ref_level_db: 20
+ref_norm_layer: bn
+rename_tmux: true
+residual_channels: 256
+residual_layers: 20
+resume_from_checkpoint: 0
+save_best: true
+save_codes: []
+save_f0: false
+save_gt: true
+schedule_type: vpsde
+scheduler: rsqrt
+seed: 1234
+sil_add_noise: false
+sort_by_len: true
+spec_max: []
+spec_min: []
+task_cls: modules.ProDiff.task.ProDiff_teacher_task.ProDiff_teacher_Task
+tb_log_interval: 100
+test_ids: []
+test_input_dir: ''
+test_num: 100
+test_set_name: test
+timescale: 1
+timesteps: 4
+train_set_name: train
+train_sets: ''
+use_cond_disc: true
+use_energy_embed: true
+use_gt_dur: true
+use_gt_f0: true
+use_pitch_embed: true
+use_pos_embed: true
+use_ref_enc: false
+use_spk_embed: false
+use_spk_id: false
+use_split_spk_id: false
+use_uv: true
+use_var_enc: false
+val_check_interval: 2000
+valid_infer_interval: 10000
+valid_monitor_key: val_loss
+valid_monitor_mode: min
+valid_set_name: valid
+var_enc_vq_codes: 64
+vocoder_denoise_c: 0.0
+warmup_updates: 2000
+weight_decay: 0
+win_size: 1024
+word_size: 30000
+work_dir: checkpoints/ProDiff_Teacher1

checkpoints/ProDiff_Teacher/model_ckpt_steps_188000.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d3d02a215431c69dd54c1413b9a02cdc32795e2039ad9be857b12e85c470eea
+size 342252871

data/binary/LJSpeech/phone_set.json ADDED Viewed

	@@ -0,0 +1 @@

+ ["!", ",", ".", ":", ";", "<BOS>", "<EOS>", "?", "AA0", "AA1", "AA2", "AE0", "AE1", "AE2", "AH0", "AH1", "AH2", "AO0", "AO1", "AO2", "AW0", "AW1", "AW2", "AY0", "AY1", "AY2", "B", "CH", "D", "DH", "EH0", "EH1", "EH2", "ER0", "ER1", "ER2", "EY0", "EY1", "EY2", "F", "G", "HH", "IH0", "IH1", "IH2", "IY0", "IY1", "IY2", "JH", "K", "L", "M", "N", "NG", "OW0", "OW1", "OW2", "OY0", "OY1", "OY2", "P", "R", "S", "SH", "T", "TH", "UH0", "UH1", "UH2", "UW0", "UW1", "UW2", "V", "W", "Y", "Z", "ZH", "|"]

data/binary/LJSpeech/spk_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"SPK1": 0}

data/binary/LJSpeech/train_f0s_mean_std.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8790d5a84d77143690ae71a1f1e7fc81359e69ead263dc440366f2164c739efd
+size 144