Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,6 @@ Metis is fully compatible with MaskGCT and shares several key model components w
|
|
37 |
| [Semantic Codec](https://huggingface.co/amphion/MaskGCT/tree/main/semantic_codec) | Converting speech to semantic tokens. |
|
38 |
| [Acoustic Codec](https://huggingface.co/amphion/MaskGCT/tree/main/acoustic_codec) | Converting speech to acoustic tokens and reconstructing waveform from acoustic tokens. |
|
39 |
| [Semantic2Acoustic](https://huggingface.co/amphion/MaskGCT/tree/main/s2a_model) | Predicts acoustic tokens conditioned on semantic tokens.
|
40 |
-
<!-- | [MaskGCT-T2S](https://huggingface.co/amphion/MaskGCT/tree/main/t2s_model) | Predicting semantic tokens with text and prompt semantic tokens. | -->
|
41 |
|
42 |
We open-source the pretrained model checkpoint of the first stage of Metis (with masked generative pre-training), as well as the fine-tuned models for speech enhancement (SE), target speaker extraction (TSE), voice conversion (VC), lip-to-speech (L2S), and the unified multi-task (Omni) model.
|
43 |
|
@@ -46,9 +45,9 @@ For zero-shot text-to-speech, you can download the text2semantic model from Mask
|
|
46 |
| Model Name | Description |
|
47 |
| --- | --- |
|
48 |
| [Metis-Base](https://huggingface.co/amphion/metis/tree/main/metis_base) | The base model pre-trained with masked generative pre-training. |
|
49 |
-
| [Metis-TSE](https://huggingface.co/amphion/metis/tree/main/metis_tse) | Fine-tuned model for target speaker extraction. Available in both full-scale and LoRA (
|
50 |
| [Metis-VC](https://huggingface.co/amphion/metis/tree/main/metis_vc) | Fine-tuned model for voice conversion. Available in full-scale version. |
|
51 |
-
| [Metis-SE](https://huggingface.co/amphion/metis/tree/main/metis_se) | Fine-tuned model for speech enhancement. Available in both full-scale and LoRA (
|
52 |
| [Metis-L2S](https://huggingface.co/amphion/metis/tree/main/metis_l2s) | Fine-tuned model for lip-to-speech. Available in full-scale version. |
|
53 |
| [Metis-TTS](https://huggingface.co/amphion/MaskGCT/tree/main/t2s_model) | Zero-shot text-to-speech model (as same as the first stage of MaskGCT). |
|
54 |
| [Metis-Omni](https://huggingface.co/amphion/metis/tree/main/metis_omni) | Unified multi-task model supporting zero-shot TTS, VC, TSE, and SE. |
|
|
|
37 |
| [Semantic Codec](https://huggingface.co/amphion/MaskGCT/tree/main/semantic_codec) | Converting speech to semantic tokens. |
|
38 |
| [Acoustic Codec](https://huggingface.co/amphion/MaskGCT/tree/main/acoustic_codec) | Converting speech to acoustic tokens and reconstructing waveform from acoustic tokens. |
|
39 |
| [Semantic2Acoustic](https://huggingface.co/amphion/MaskGCT/tree/main/s2a_model) | Predicts acoustic tokens conditioned on semantic tokens.
|
|
|
40 |
|
41 |
We open-source the pretrained model checkpoint of the first stage of Metis (with masked generative pre-training), as well as the fine-tuned models for speech enhancement (SE), target speaker extraction (TSE), voice conversion (VC), lip-to-speech (L2S), and the unified multi-task (Omni) model.
|
42 |
|
|
|
45 |
| Model Name | Description |
|
46 |
| --- | --- |
|
47 |
| [Metis-Base](https://huggingface.co/amphion/metis/tree/main/metis_base) | The base model pre-trained with masked generative pre-training. |
|
48 |
+
| [Metis-TSE](https://huggingface.co/amphion/metis/tree/main/metis_tse) | Fine-tuned model for target speaker extraction. Available in both full-scale and LoRA (r = 32) versions. |
|
49 |
| [Metis-VC](https://huggingface.co/amphion/metis/tree/main/metis_vc) | Fine-tuned model for voice conversion. Available in full-scale version. |
|
50 |
+
| [Metis-SE](https://huggingface.co/amphion/metis/tree/main/metis_se) | Fine-tuned model for speech enhancement. Available in both full-scale and LoRA (r = 32) versions. |
|
51 |
| [Metis-L2S](https://huggingface.co/amphion/metis/tree/main/metis_l2s) | Fine-tuned model for lip-to-speech. Available in full-scale version. |
|
52 |
| [Metis-TTS](https://huggingface.co/amphion/MaskGCT/tree/main/t2s_model) | Zero-shot text-to-speech model (as same as the first stage of MaskGCT). |
|
53 |
| [Metis-Omni](https://huggingface.co/amphion/metis/tree/main/metis_omni) | Unified multi-task model supporting zero-shot TTS, VC, TSE, and SE. |
|