Audio-to-Audio
dualcodec
jiaqili3 commited on
Commit
73faed6
·
verified ·
1 Parent(s): cd3a69a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -9
README.md CHANGED
@@ -1,9 +1,16 @@
1
- # DualCodec
 
 
 
2
  ## Installation
3
  ```bash
4
  pip install dualcodec
5
  ```
6
 
 
 
 
 
7
  ## Available models
8
  <!-- - 12hz_v1: DualCodec model trained with 12Hz sampling rate.
9
  - 25hz_v1: DualCodec model trained with 25Hz sampling rate. -->
@@ -14,22 +21,23 @@ pip install dualcodec
14
  | 25hz_v1 | 25Hz | Any from 1-12 (maximum 12) | 16384 | 1024 | 100K hours Emilia |
15
 
16
 
17
- ## How to inference
18
 
19
- Download checkpoints to local:
20
  ```
21
  # export HF_ENDPOINT=https://hf-mirror.com # uncomment this to use huggingface mirror if you're in China
22
  huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
23
- huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts
24
  ```
 
25
 
26
- To inference an audio in a python script:
27
  ```python
28
  import dualcodec
29
 
30
  w2v_path = "./w2v-bert-2.0" # your downloaded path
31
  dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
32
- model_id = "12hz_v1" # or "25hz_v1"
33
 
34
  dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
35
  inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")
@@ -52,7 +60,66 @@ out_audio = dualcodec_model.decode_from_codes(semantic_codes, acoustic_codes)
52
  torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)
53
  ```
54
 
55
- See "example.ipynb" for example inference scripts.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
- ## Training DualCodec
58
- Stay tuned for the training code release!
 
1
+ # DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
2
+
3
+ ## About
4
+
5
  ## Installation
6
  ```bash
7
  pip install dualcodec
8
  ```
9
 
10
+ ## News
11
+ - 2025-01-22: I added training and finetuning instructions for DualCodec, version is v0.3.0.
12
+ - 2025-01-16: Finished writing DualCodec inference codes, the version is v0.1.0.
13
+
14
  ## Available models
15
  <!-- - 12hz_v1: DualCodec model trained with 12Hz sampling rate.
16
  - 25hz_v1: DualCodec model trained with 25Hz sampling rate. -->
 
21
  | 25hz_v1 | 25Hz | Any from 1-12 (maximum 12) | 16384 | 1024 | 100K hours Emilia |
22
 
23
 
24
+ ## How to inference DualCodec
25
 
26
+ ### 1. Download checkpoints to local:
27
  ```
28
  # export HF_ENDPOINT=https://hf-mirror.com # uncomment this to use huggingface mirror if you're in China
29
  huggingface-cli download facebook/w2v-bert-2.0 --local-dir w2v-bert-2.0
30
+ huggingface-cli download amphion/dualcodec dualcodec_12hz_16384_4096.safetensors dualcodec_25hz_16384_1024.safetensors w2vbert2_mean_var_stats_emilia.pt --local-dir dualcodec_ckpts
31
  ```
32
+ The second command downloads the two DualCodec model (12hz_v1 and 25hz_v1) checkpoints and a w2v-bert-2 mean and variance statistics to the local directory `dualcodec_ckpts`.
33
 
34
+ ### 2. To inference an audio in a python script:
35
  ```python
36
  import dualcodec
37
 
38
  w2v_path = "./w2v-bert-2.0" # your downloaded path
39
  dualcodec_model_path = "./dualcodec_ckpts" # your downloaded path
40
+ model_id = "12hz_v1" # select from available Model_IDs, "12hz_v1" or "25hz_v1"
41
 
42
  dualcodec_model = dualcodec.get_model(model_id, dualcodec_model_path)
43
  inference = dualcodec.Inference(dualcodec_model=dualcodec_model, dualcodec_path=dualcodec_model_path, w2v_path=w2v_path, device="cuda")
 
60
  torchaudio.save("out.wav", out_audio.cpu().squeeze(0), 24000)
61
  ```
62
 
63
+ See "example.ipynb" for a running example.
64
+
65
+ ## DualCodec-based TTS models
66
+ ### DualCodec-based TTS
67
+
68
+ ## Benchmark results
69
+ ### DualCodec audio quality
70
+ ### DualCodec-based TTS
71
+
72
+ ## Finetuning DualCodec
73
+ 1. Install other necessary components for training:
74
+ ```bash
75
+ pip install "dualcodec[train]"
76
+ ```
77
+ 2. Clone this repository and `cd` to project root folder.
78
+
79
+ 3. Get discriminator checkpoints:
80
+ ```bash
81
+ huggingface-cli download amphion/dualcodec --local-dir dualcodec_ckpts
82
+ ```
83
+
84
+ 4. To run example training on Emilia German data (streaming, no need to download files. Need to access Huggingface):
85
+ ```bash
86
+ accelerate launch train.py --config-name=dualcodec_ft_12hzv1 \
87
+ trainer.batch_size=3 \
88
+ data.segment_speech.segment_length=24000
89
+ ```
90
+ This trains from scratch a 12hz_v1 model with a training batch size of 3. (typically you need larger batch sizes)
91
+
92
+ To finetune a 25Hz_V1 model:
93
+ ```bash
94
+ accelerate launch train.py --config-name=dualcodec_ft_25hzv1 \
95
+ trainer.batch_size=3 \
96
+ data.segment_speech.segment_length=24000
97
+ ```
98
+
99
+
100
+ ## Training DualCodec from scratch
101
+ 1. Install other necessary components for training:
102
+ ```bash
103
+ pip install dualcodec[train]
104
+ ```
105
+ 2. Clone this repository and `cd` to project root folder.
106
+
107
+ 3. To run example training on example Emilia German data:
108
+ ```bash
109
+ accelerate launch train.py --config-name=codec_train \
110
+ model=dualcodec_12hz_16384_4096_8vq \
111
+ trainer.batch_size=3 \
112
+ data.segment_speech.segment_length=24000
113
+ ```
114
+ This trains from scratch a dualcodec_12hz_16384_4096_8vq model with a training batch size of 3. (typically you need larger batch sizes)
115
+
116
+ To train a 25Hz model:
117
+ ```bash
118
+ accelerate launch train.py --config-name=codec_train \
119
+ model=dualcodec_25hz_16384_1024_12vq \
120
+ trainer.batch_size=3 \
121
+ data.segment_speech.segment_length=24000
122
+
123
+ ```
124
 
125
+ ## Citation