|
--- |
|
'[object Object]': null |
|
license: mit |
|
tags: |
|
- audio |
|
- deep-learning |
|
- pytorch |
|
- generative-adversarial-network |
|
- codec |
|
- gans |
|
- compression-algorithm |
|
- audio-compression |
|
- RVQ |
|
--- |
|
|
|
|
|
# Descript Audio Codec |
|
|
|
π With Descript Audio Codec, you can compress **44.1 KHz audio** into discrete codes at a **low 8 kbps bitrate**. <br> |
|
π€ That's approximately **90x compression** while maintaining exceptional fidelity and minimizing artifacts. <br> |
|
πͺ Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio. <br> |
|
π It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br> |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **License:** MIT |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [Github Repo](https://github.com/descriptinc/descript-audio-codec) |
|
- **Paper:** [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN |
|
](http://arxiv.org/abs/2306.06546) |
|
- **Demo:** [Demo Site](https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5) |
|
|
|
## Uses |
|
|
|
The model is intended for compressing audio files containing speech, music and environmental sounds. |
|
|
|
### Out-of-Scope Use |
|
|
|
It is not intended to be used for compressing other file formats such as text, images, etc. |
|
|
|
## Bias, Risks, and Limitations |
|
Our model has difficulty reconstructing some challenging audio. It |
|
performs best for speech and has more issues with environmental sounds. It |
|
does not model some musical instruments perfectly, such as glockenspeil, or synthesizer sounds. |
|
|
|
|
|
## How to Get Started with the Model |
|
This model is meant to be used with our official repo linked above. We release the model here for redundancy purposes. |
|
Our code is able to pull the weights from their |
|
[original location on Github](https://github.com/descriptinc/descript-audio-codec/releases/download/0.0.1/weights.pth). |
|
Please refer to the official [README](https://github.com/descriptinc/descript-audio-codec#readme) for usage instructions. |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
``` |
|
@misc{kumar2023highfidelity, |
|
title={High-Fidelity Audio Compression with Improved RVQGAN}, |
|
author={Rithesh Kumar and Prem Seetharaman and Alejandro Luebs and Ishaan Kumar and Kundan Kumar}, |
|
year={2023}, |
|
eprint={2306.06546}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.SD} |
|
} |
|
``` |