|
# Pretrained Models Dependency |
|
|
|
The models dependency of Amphion are as follows (sort alphabetically): |
|
|
|
- [Pretrained Models Dependency](#pretrained-models-dependency) |
|
- [Amphion Singing BigVGAN](#amphion-singing-bigvgan) |
|
- [Amphion Speech HiFi-GAN](#amphion-speech-hifi-gan) |
|
- [ContentVec](#contentvec) |
|
- [WeNet](#wenet) |
|
- [Whisper](#whisper) |
|
- [RawNet3](#rawnet3) |
|
|
|
|
|
The instructions about how to download them is displayed as follows. |
|
|
|
## Amphion Singing BigVGAN |
|
|
|
We fine-tune the official BigVGAN pretrained model with over 120 hours singing voice data. The fine-tuned checkpoint can be downloaded [here](https://huggingface.co/amphion/BigVGAN_singing_bigdata). You need to download the `400000.pt` and `args.json` files into `Amphion/pretrained/bigvgan`: |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ bivgan |
|
β β β£ 400000.pt |
|
β β β£ args.json |
|
``` |
|
|
|
## Amphion Speech HiFi-GAN |
|
|
|
We trained our HiFi-GAN pretrained model with 685 hours speech data. Which can be downloaded [here](https://huggingface.co/amphion/hifigan_speech_bigdata). You need to download the whole folder of `hifigan_speech` into `Amphion/pretrained/hifigan`. |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ hifigan |
|
β β β£ hifigan_speech |
|
β β β β£ log |
|
β β β β£ result |
|
β β β β£ checkpoint |
|
β β β β£ args.json |
|
``` |
|
|
|
## Amphion DiffWave |
|
|
|
We trained our DiffWave pretrained model with 125 hours speech data and around 80 hours of singing voice data. Which can be downloaded [here](https://huggingface.co/amphion/diffwave). You need to download the whole folder of `diffwave` into `Amphion/pretrained/diffwave`. |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ diffwave |
|
β β β£ diffwave_speech |
|
β β β β£ samples |
|
β β β β£ checkpoint |
|
β β β β£ args.json |
|
``` |
|
|
|
## ContentVec |
|
|
|
You can download the pretrained ContentVec model [here](https://github.com/auspicious3000/contentvec). Note that we use the `ContentVec_legacy-500 classes` checkpoint. Assume that you download the `checkpoint_best_legacy_500.pt` into the `Amphion/pretrained/contentvec`. |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ contentvec |
|
β β β£ checkpoint_best_legacy_500.pt |
|
``` |
|
|
|
## WeNet |
|
|
|
You can download the pretrained WeNet model [here](https://github.com/wenet-e2e/wenet/blob/main/docs/pretrained_models.md). Take the `wenetspeech` pretrained checkpoint as an example, assume you download the `wenetspeech_u2pp_conformer_exp.tar` into the `Amphion/pretrained/wenet`. Unzip it and modify its configuration file as follows: |
|
|
|
```sh |
|
cd Amphion/pretrained/wenet |
|
|
|
### Unzip the expt dir |
|
tar -xvf wenetspeech_u2pp_conformer_exp.tar.gz |
|
|
|
### Specify the updated path in train.yaml |
|
cd 20220506_u2pp_conformer_exp |
|
vim train.yaml |
|
# TODO: Change the value of "cmvn_file" (Line 2) to the absolute path of the `global_cmvn` file. (Eg: [YourPath]/Amphion/pretrained/wenet/20220506_u2pp_conformer_exp/global_cmvn) |
|
``` |
|
|
|
The final file struture tree is like: |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ wenet |
|
β β β£ 20220506_u2pp_conformer_exp |
|
β β β β£ final.pt |
|
β β β β£ global_cmvn |
|
β β β β£ train.yaml |
|
β β β β£ units.txt |
|
``` |
|
|
|
## Whisper |
|
|
|
The official pretrained whisper checkpoints can be available [here](https://github.com/openai/whisper/blob/e58f28804528831904c3b6f2c0e473f346223433/whisper/__init__.py#L17). In Amphion, we use the `medium` whisper model by default. You can download it as follows: |
|
|
|
```bash |
|
cd Amphion/pretrained |
|
mkdir whisper |
|
cd whisper |
|
|
|
wget https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt |
|
``` |
|
|
|
The final file structure tree is like: |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ whisper |
|
β β β£ medium.pt |
|
``` |
|
|
|
## RawNet3 |
|
|
|
The official pretrained RawNet3 checkpoints can be available [here](https://huggingface.co/jungjee/RawNet3). You need to download the `model.pt` file and put it in the folder. |
|
|
|
The final file structure tree is like: |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ rawnet3 |
|
β β β£ model.pt |
|
``` |
|
|
|
|
|
# (Optional) Model Dependencies for Evaluation |
|
When utilizing Amphion's Evaluation Pipelines, terminals without access to `huggingface.co` may encounter error messages such as "OSError: Can't load tokenizer for ...". To work around this, the dependant models for evaluation can be pre-prepared and stored here, at `Amphion/pretrained`, and follow [this README](../egs/metrics/README.md#troubleshooting) to configure your environment to load local models. |
|
|
|
The dependant models of Amphion's evaluation pipeline are as follows (sort alphabetically): |
|
|
|
- [Evaluation Pipeline Models Dependency](#optional-model-dependencies-for-evaluation) |
|
- [bert-base-uncased](#bert-base-uncased) |
|
- [facebook/bart-base](#facebookbart-base) |
|
- [roberta-base](#roberta-base) |
|
- [wavlm](#wavlm) |
|
|
|
The instructions about how to download them is displayed as follows. |
|
|
|
## bert-base-uncased |
|
|
|
To load `bert-base-uncased` locally, follow [this link](https://huggingface.co/bert-base-uncased) to download all files for `bert-base-uncased` model, and store them under `Amphion/pretrained/bert-base-uncased`, conforming to the following file structure tree: |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ bert-base-uncased |
|
β β β£ config.json |
|
β β β£ coreml |
|
β β β β£ fill-mask |
|
β β β β£ float32_model.mlpackage |
|
β β β β£ Data |
|
β β β β£ com.apple.CoreML |
|
β β β β£ model.mlmodel |
|
β β β£ flax_model.msgpack |
|
β β β£ LICENSE |
|
β β β£ model.onnx |
|
β β β£ model.safetensors |
|
β β β£ pytorch_model.bin |
|
β β β£ README.md |
|
β β β£ rust_model.ot |
|
β β β£ tf_model.h5 |
|
β β β£ tokenizer_config.json |
|
β β β£ tokenizer.json |
|
β β β£ vocab.txt |
|
``` |
|
|
|
## facebook/bart-base |
|
|
|
To load `facebook/bart-base` locally, follow [this link](https://huggingface.co/facebook/bart-base) to download all files for `facebook/bart-base` model, and store them under `Amphion/pretrained/facebook/bart-base`, conforming to the following file structure tree: |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ facebook |
|
β β β£ bart-base |
|
β β β β£ config.json |
|
β β β β£ flax_model.msgpack |
|
β β β β£ gitattributes.txt |
|
β β β β£ merges.txt |
|
β β β β£ model.safetensors |
|
β β β β£ pytorch_model.bin |
|
β β β β£ README.txt |
|
β β β β£ rust_model.ot |
|
β β β β£ tf_model.h5 |
|
β β β β£ tokenizer.json |
|
β β β β£ vocab.json |
|
``` |
|
|
|
## roberta-base |
|
|
|
To load `roberta-base` locally, follow [this link](https://huggingface.co/roberta-base) to download all files for `roberta-base` model, and store them under `Amphion/pretrained/roberta-base`, conforming to the following file structure tree: |
|
|
|
``` |
|
Amphion |
|
β£ pretrained |
|
β β£ roberta-base |
|
β β β£ config.json |
|
β β β£ dict.txt |
|
β β β£ flax_model.msgpack |
|
β β β£ gitattributes.txt |
|
β β β£ merges.txt |
|
β β β£ model.safetensors |
|
β β β£ pytorch_model.bin |
|
β β β£ README.txt |
|
β β β£ rust_model.ot |
|
β β β£ tf_model.h5 |
|
β β β£ tokenizer.json |
|
β β β£ vocab.json |
|
``` |
|
|
|
## wavlm |
|
|
|
The official pretrained wavlm checkpoints can be available [here](https://huggingface.co/microsoft/wavlm-base-plus-sv). The file structure tree is as follows: |
|
|
|
``` |
|
Amphion |
|
β£ wavlm |
|
β β£ config.json |
|
β β£ preprocessor_config.json |
|
β β£ pytorch_model.bin |
|
``` |