Improve model card with pipeline tag, library name, and updated links
Browse filesThis PR improves the model card by:
- Adding the `pipeline_tag: audio-to-audio` to improve searchability on the Hugging Face Hub.
- Specifying the `library_name: torch`, enabling the "How to use" button and clarifying the framework used.
- Updating the links in the "Available models" table to point to the correct Hugging Face model URLs.
- Updating the news section with more recent updates from the Github README.
README.md
CHANGED
@@ -1,5 +1,7 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
3 |
tags:
|
4 |
- audio-feature-extraction
|
5 |
- speech-language-models
|
@@ -9,9 +11,8 @@ tags:
|
|
9 |
- text-to-speech
|
10 |
- automatic-speech-recognition
|
11 |
---
|
12 |
-
# WavTokenizer: SOTA Discrete Codec Models With Forty Tokens Per Second for Audio Language Modeling
|
13 |
-
|
14 |
|
|
|
15 |
|
16 |
[](https://arxiv.org/abs/2408.16532)
|
17 |
[](https://wavtokenizer.github.io/)
|
@@ -21,10 +22,13 @@ tags:
|
|
21 |
|
22 |
### ππ with WavTokenizer, you can represent speech, music, and audio with only 40 tokens per second!
|
23 |
### ππ with WavTokenizer, You can get strong reconstruction results.
|
24 |
-
### ππ WavTokenizer owns rich semantic information and is build for audio language models such as
|
25 |
|
26 |
# π₯ News
|
27 |
-
- *
|
|
|
|
|
|
|
28 |
|
29 |

|
30 |
|
@@ -112,10 +116,9 @@ audio_out = wavtokenizer.decode(features, bandwidth_id=bandwidth_id)
|
|
112 |
|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------:|:----------:|:------:|
|
113 |
| WavTokenizer-small-600-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_600_24k_4096.ckpt) | LibriTTS | 40 | Speech | β |
|
114 |
| WavTokenizer-small-320-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_320_24k_4096.ckpt) | LibriTTS | 75 | Speech | β|
|
115 |
-
| WavTokenizer-medium-
|
116 |
-
| WavTokenizer-
|
117 |
-
| WavTokenizer-large-
|
118 |
-
| WavTokenizer-large-320-24k-4096 | [π€](https://github.com/jishengpeng/wavtokenizer) | 80000 Hours | 75 | Speech, Audio, Music | Coming Soon |
|
119 |
|
120 |
|
121 |
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
library_name: torch
|
4 |
+
pipeline_tag: audio-to-audio
|
5 |
tags:
|
6 |
- audio-feature-extraction
|
7 |
- speech-language-models
|
|
|
11 |
- text-to-speech
|
12 |
- automatic-speech-recognition
|
13 |
---
|
|
|
|
|
14 |
|
15 |
+
# WavTokenizer: SOTA Discrete Codec Models With Forty Tokens Per Second for Audio Language Modeling
|
16 |
|
17 |
[](https://arxiv.org/abs/2408.16532)
|
18 |
[](https://wavtokenizer.github.io/)
|
|
|
22 |
|
23 |
### ππ with WavTokenizer, you can represent speech, music, and audio with only 40 tokens per second!
|
24 |
### ππ with WavTokenizer, You can get strong reconstruction results.
|
25 |
+
### ππ WavTokenizer owns rich semantic information and is build for audio language models such as GPT-4o.
|
26 |
|
27 |
# π₯ News
|
28 |
+
- *2025.02.25*: We update WavTokenizer camera ready version for ICLR 2025 and update WavTokenizer-large-v2 checkpoint on [huggingface](https://huggingface.co/novateur/WavTokenizer-large-speech-75token).
|
29 |
+
- *2024.10.22*: We update WavTokenizer on arxiv and release WavTokenizer-Large checkpoint.
|
30 |
+
- *2024.09.09*: We release WavTokenizer-medium checkpoint on [huggingface](https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0).
|
31 |
+
- *2024.08.31*: We release WavTokenizer on arxiv.
|
32 |
|
33 |

|
34 |
|
|
|
116 |
|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------:|:----------:|:------:|
|
117 |
| WavTokenizer-small-600-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_600_24k_4096.ckpt) | LibriTTS | 40 | Speech | β |
|
118 |
| WavTokenizer-small-320-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_320_24k_4096.ckpt) | LibriTTS | 75 | Speech | β|
|
119 |
+
| WavTokenizer-medium-320-24k-4096 | [π€](https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0) | 10000 Hours | 75 | Speech, Audio, Music | β |
|
120 |
+
| WavTokenizer-large-600-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer-large-unify-40token) | 80000 Hours | 40 | Speech, Audio, Music | β|
|
121 |
+
| WavTokenizer-large-320-24k-4096 | [π€](https://huggingface.co/novateur/WavTokenizer-large-speech-75token) | 80000 Hours | 75 | Speech, Audio, Music | β |
|
|
|
122 |
|
123 |
|
124 |
|