hubertsiuzdak reach-vb HF staff commited on
Commit
5c06c49
1 Parent(s): 9929681

Update README.md (#1)

Browse files

- Update README.md (48f5bd327f775912cfc79021721ac1e50b349509)
- Update README.md (8853c58dec981b99fa1ff39bd31243443939d351)


Co-authored-by: Vaibhav Srivastav <reach-vb@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +70 -0
README.md CHANGED
@@ -1,3 +1,73 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ # Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
6
+
7
+ [Audio samples](https://charactr-platform.github.io/vocos/) |
8
+ Paper [[abs]](https://arxiv.org/abs/2306.00814) [[pdf]](https://arxiv.org/pdf/2306.00814.pdf)
9
+
10
+ Vocos is a fast neural vocoder designed to synthesize audio waveforms from acoustic features. Trained using a Generative
11
+ Adversarial Network (GAN) objective, Vocos can generate waveforms in a single forward pass. Unlike other typical
12
+ GAN-based vocoders, Vocos does not model audio samples in the time domain. Instead, it generates spectral
13
+ coefficients, facilitating rapid audio reconstruction through inverse Fourier transform.
14
+
15
+ ## Installation
16
+
17
+ To use Vocos only in inference mode, install it using:
18
+
19
+ ```bash
20
+ pip install vocos
21
+ ```
22
+
23
+ If you wish to train the model, install it with additional dependencies:
24
+
25
+ ```bash
26
+ pip install vocos[train]
27
+ ```
28
+
29
+ ## Usage
30
+
31
+ ### Reconstruct audio from EnCodec tokens
32
+
33
+ Additionally, you need to provide a `bandwidth_id` which corresponds to the embedding for bandwidth from the
34
+ list: `[1.5, 3.0, 6.0, 12.0]`.
35
+
36
+ ```python
37
+ vocos = Vocos.from_pretrained("charactr/vocos-encodec-24khz")
38
+
39
+ audio_tokens = torch.randint(low=0, high=1024, size=(8, 200)) # 8 codeboooks, 200 frames
40
+ features = vocos.codes_to_features(audio_tokens)
41
+ bandwidth_id = torch.tensor([2]) # 6 kbps
42
+
43
+ audio = vocos.decode(features, bandwidth_id=bandwidth_id)
44
+ ```
45
+
46
+ Copy-synthesis from a file: It extracts and quantizes features with EnCodec, then reconstructs them with Vocos in a
47
+ single forward pass.
48
+
49
+ ```python
50
+ y, sr = torchaudio.load(YOUR_AUDIO_FILE)
51
+ if y.size(0) > 1: # mix to mono
52
+ y = y.mean(dim=0, keepdim=True)
53
+ y = torchaudio.functional.resample(y, orig_freq=sr, new_freq=24000)
54
+
55
+ y_hat = vocos(y, bandwidth_id=bandwidth_id)
56
+ ```
57
+
58
+ ## Citation
59
+
60
+ If this code contributes to your research, please cite our work:
61
+
62
+ ```
63
+ @article{siuzdak2023vocos,
64
+ title={Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis},
65
+ author={Siuzdak, Hubert},
66
+ journal={arXiv preprint arXiv:2306.00814},
67
+ year={2023}
68
+ }
69
+ ```
70
+
71
+ ## License
72
+
73
+ The code in this repository is released under the MIT license.