pfb30 commited on
Commit
cdefbdd
1 Parent(s): 43310a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -1,3 +1,18 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ # BigVGAN-L
5
+ The 24kHz model was pretrained using LibriTTS dataset with a full 100-band mel spectrogram as input (see ```config.json``` for the exact hyperparameter setup) with the [BigVGAN](https://github.com/NVIDIA/BigVGAN)
6
+ repository. The pretraining was performed over 1300000 mln steps with a 100 batch size with 8 A100 40GB GPUs.
7
+
8
+ # Inference
9
+ The run the inference with the example command for generating audio from the model. It computes mel spectrograms using wav files from --input_wavs_dir and saves the generated audio to --output_dir.
10
+ ```
11
+ python NEMO_PATH/inference.py \
12
+ --checkpoint_file MODEL_PATH/BigVGAN-L/g_01300000.pt \
13
+ --input_wavs_dir AUDIO_PATH/input_wav \
14
+ --output_dir AUDIO_PATH/output_wav
15
+ ```
16
+
17
+ # Continual finetuning
18
+ The vocoder can be finetuned further on using the NEMO_PATH/train.py script as the checkpoints save all the optimizer information.