reach-vb HF staff commited on
Commit
48f5bd3
1 Parent(s): 9929681

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md CHANGED
@@ -1,3 +1,98 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
6
+
7
+ [Audio samples](https://charactr-platform.github.io/vocos/) |
8
+ Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)
9
+
10
+ Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative
11
+ Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical
12
+ GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral
13
+ coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
14
+
15
+ ## Installation
16
+
17
+ To use Vocos only in inference mode, install it using:
18
+
19
+ ```bash
20
+ pip install vocos
21
+ ```
22
+
23
+ If you wish to train the model, install it with additional dependencies:
24
+
25
+ ```bash
26
+ pip install vocos[train]
27
+ ```
28
+
29
+ ## Usage
30
+
31
+ ### Reconstruct audio from mel-spectrogram
32
+
33
+ ```python
34
+ import torch
35
+
36
+ from vocos import Vocos
37
+
38
+ vocos = Vocos.from_pretrained("charactr/vocos-mel-24khz")
39
+
40
+ mel = torch.randn(1, 100, 256) # B, C, T
41
+ audio = vocos.decode(mel)
42
+ ```
43
+
44
+ Copy-synthesis from a file:
45
+
46
+ ```python
47
+ import torchaudio
48
+
49
+ y, sr = torchaudio.load(YOUR_AUDIO_FILE)
50
+ if y.size(0) > 1: # mix to mono
51
+ y = y.mean(dim=0, keepdim=True)
52
+ y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)
53
+ y_hat = vocos(y)
54
+ ```
55
+
56
+ ### Reconstruct audio from EnCodec tokens
57
+
58
+ Additionally, you need to provide a `bandwidth_id` which corresponds to the embedding for bandwidth from the
59
+ list: `[1.5, 3.0, 6.0, 12.0]`.
60
+
61
+ ```python
62
+ vocos = Vocos.from_pretrained("charactr/vocos-encodec-24khz")
63
+
64
+ audio_tokens = torch.randint(low=0, high=1024, size=(8, 200)) # 8 codeboooks, 200 frames
65
+ features = vocos.codes_to_features(audio_tokens)
66
+ bandwidth_id = torch.tensor([2]) # 6 kbps
67
+
68
+ audio = vocos.decode(features, bandwidth_id=bandwidth_id)
69
+ ```
70
+
71
+ Copy-synthesis from a file: It extracts and quantizes features with EnCodec, then reconstructs them with Vocos in a
72
+ single forward pass.
73
+
74
+ ```python
75
+ y, sr = torchaudio.load(YOUR_AUDIO_FILE)
76
+ if y.size(0) > 1: # mix to mono
77
+ y = y.mean(dim=0, keepdim=True)
78
+ y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)
79
+
80
+ y_hat = vocos(y, bandwidth_id=bandwidth_id)
81
+ ```
82
+
83
+ ## Citation
84
+
85
+ If this code contributes to your research, please cite our work:
86
+
87
+ ```
88
+ @article{siuzdak2023vocos,
89
+ title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
90
+ author={Siuzdak, Hubert},
91
+ journal={arXiv preprint arXiv:2306.00814},
92
+ year={2023}
93
+ }
94
+ ```
95
+
96
+ ## License
97
+
98
+ The code in this repository is released under the MIT license.