parler-tts
/

dac_44khZ_8kbps

Inference Endpoints

Model card Files Files and versions Community

ylacombe HF staff commited on Apr 10

Commit

b1c0224

•

1 Parent(s): 86ce25f

Update README.md

Files changed (1) hide show

README.md +21 -3

README.md CHANGED Viewed

@@ -8,7 +8,27 @@ license: mit
 # Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN
-This repository contains training and inference scripts for the Descript Audio Codec (.dac), a high fidelity general neural audio codec, introduced in the paper titled **High-Fidelity Audio Compression with Improved RVQGAN**.
 [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
 ](http://arxiv.org/abs/2306.06546) <br>
@@ -21,8 +41,6 @@ This repository contains training and inference scripts for the Descript Audio C
 👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>
-## Original Usage
 ### Installation
 ```
 pip install descript-audio-codec

 # Descript Audio Codec (.dac): High-Fidelity Audio Compression with Improved RVQGAN
+This repository is a wrapper around the original **Descript Audio Codec** model, a high fidelity general neural audio codec, introduced in the paper titled **High-Fidelity Audio Compression with Improved RVQGAN**.
+It is designed to be used as a drop-in replacement of the [transformers implementation](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/encodec#overview) of [Encodec](https://github.com/facebookresearch/encodec), so that architectures that use Encodec can also be trained with DAC instead.
+The [Parler-TTS library](https://github.com/huggingface/parler-tts) is an example of how to use DAC to train high-quality TTS models. We released [Parler-TTS Mini v0.1]("https://huggingface.co/parler-tts/parler_tts_300M_v0.1"), a first iteration model trained using 10k hours of narrated audiobooks. It generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation)
+To use this checkpoint, you first need to install the [Parler-TTS library](https://github.com/huggingface/parler-tts) with (to do once):
+```sh
+pip install git+https://github.com/huggingface/parler-tts.git
+```
+And then use:
+```python
+from parler_tts import DACModel
+dac_model = DACModel.from_pretrained("parler-tts/dac_44khZ_8kbps")
+```
+🚨 If you want to use the original DAC codebase, refers to the [original repository](https://github.com/descriptinc/descript-audio-codec/tree/main) or to the [Original Usage](#original-usage) section.
+## Original Usage
 [arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
 ](http://arxiv.org/abs/2306.06546) <br>
 👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) <br>
 ### Installation
 ```
 pip install descript-audio-codec