File size: 2,431 Bytes
8482ac4
e825916
8482ac4
e825916
87c36ee
e825916
87c36ee
 
 
 
 
 
e825916
8482ac4
e825916
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87c36ee
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
'[object Object]': null
license: mit
tags:
- audio
- deep-learning
- pytorch
- generative-adversarial-network
- codec
- gans
- compression-algorithm
- audio-compression
- RVQ
---


# Descript Audio Codec

๐Ÿ‘‰ With Descript Audio Codec, you can compress **44.1 KHz audio** into discrete codes at a **low 8 kbps bitrate**.  <br>
๐ŸคŒ That's approximately **90x compression** while maintaining exceptional fidelity and minimizing artifacts.  <br>
๐Ÿ’ช Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.  <br>
๐Ÿ‘Œ It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>

## Model Details

### Model Description

- **License:** MIT

### Model Sources

- **Repository:** [Github Repo](https://github.com/descriptinc/descript-audio-codec)
- **Paper:** [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
](http://arxiv.org/abs/2306.06546)
- **Demo:** [Demo Site](https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5)

## Uses

The model is intended for compressing audio files containing speech, music and environmental sounds.

### Out-of-Scope Use

It is not intended to be used for compressing other file formats such as text, images, etc.

## Bias, Risks, and Limitations
Our model has difficulty reconstructing some challenging audio. It
performs best for speech and has more issues with environmental sounds. It
does not model some musical instruments perfectly, such as glockenspeil, or synthesizer sounds.


## How to Get Started with the Model
This model is meant to be used with our official repo linked above. We release the model here for redundancy purposes. 
Our code is able to pull the weights from their 
[original location on Github](https://github.com/descriptinc/descript-audio-codec/releases/download/0.0.1/weights.pth).
Please refer to the official [README](https://github.com/descriptinc/descript-audio-codec#readme) for usage instructions.

## Citation

**BibTeX:**

```
@misc{kumar2023highfidelity,
      title={High-Fidelity Audio Compression with Improved RVQGAN}, 
      author={Rithesh Kumar and Prem Seetharaman and Alejandro Luebs and Ishaan Kumar and Kundan Kumar},
      year={2023},
      eprint={2306.06546},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}
```