nielsr HF staff commited on
Commit
fddb014
Β·
verified Β·
1 Parent(s): 917d513

Improve model card with pipeline tag, library name, and updated links

Browse files

This PR improves the model card by:

- Adding the `pipeline_tag: audio-to-audio` to improve searchability on the Hugging Face Hub.
- Specifying the `library_name: torch`, enabling the "How to use" button and clarifying the framework used.
- Updating the links in the "Available models" table to point to the correct Hugging Face model URLs.
- Updating the news section with more recent updates from the Github README.

Files changed (1) hide show
  1. README.md +11 -8
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: mit
 
 
3
  tags:
4
  - audio-feature-extraction
5
  - speech-language-models
@@ -9,9 +11,8 @@ tags:
9
  - text-to-speech
10
  - automatic-speech-recognition
11
  ---
12
- # WavTokenizer: SOTA Discrete Codec Models With Forty Tokens Per Second for Audio Language Modeling
13
-
14
 
 
15
 
16
  [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2408.16532)
17
  [![demo](https://img.shields.io/badge/WanTokenizer-Demo-red)](https://wavtokenizer.github.io/)
@@ -21,10 +22,13 @@ tags:
21
 
22
  ### πŸŽ‰πŸŽ‰ with WavTokenizer, you can represent speech, music, and audio with only 40 tokens per second!
23
  ### πŸŽ‰πŸŽ‰ with WavTokenizer, You can get strong reconstruction results.
24
- ### πŸŽ‰πŸŽ‰ WavTokenizer owns rich semantic information and is build for audio language models such as GPT4-o.
25
 
26
  # πŸ”₯ News
27
- - *2024.08*: We release WavTokenizer on arxiv.
 
 
 
28
 
29
  ![result](result.png)
30
 
@@ -112,10 +116,9 @@ audio_out = wavtokenizer.decode(features, bandwidth_id=bandwidth_id)
112
  |:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------:|:----------:|:------:|
113
  | WavTokenizer-small-600-24k-4096 | [πŸ€—](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_600_24k_4096.ckpt) | LibriTTS | 40 | Speech | √ |
114
  | WavTokenizer-small-320-24k-4096 | [πŸ€—](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_320_24k_4096.ckpt) | LibriTTS | 75 | Speech | √|
115
- | WavTokenizer-medium-600-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | 10000 Hours | 40 | Speech, Audio, Music | Coming Soon|
116
- | WavTokenizer-medium-320-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | 10000 Hours | 75 | Speech, Audio, Music | Coming Soon|
117
- | WavTokenizer-large-600-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | 80000 Hours | 40 | Speech, Audio, Music | Coming Soon|
118
- | WavTokenizer-large-320-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | 80000 Hours | 75 | Speech, Audio, Music | Coming Soon |
119
 
120
 
121
 
 
1
  ---
2
  license: mit
3
+ library_name: torch
4
+ pipeline_tag: audio-to-audio
5
  tags:
6
  - audio-feature-extraction
7
  - speech-language-models
 
11
  - text-to-speech
12
  - automatic-speech-recognition
13
  ---
 
 
14
 
15
+ # WavTokenizer: SOTA Discrete Codec Models With Forty Tokens Per Second for Audio Language Modeling
16
 
17
  [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2408.16532)
18
  [![demo](https://img.shields.io/badge/WanTokenizer-Demo-red)](https://wavtokenizer.github.io/)
 
22
 
23
  ### πŸŽ‰πŸŽ‰ with WavTokenizer, you can represent speech, music, and audio with only 40 tokens per second!
24
  ### πŸŽ‰πŸŽ‰ with WavTokenizer, You can get strong reconstruction results.
25
+ ### πŸŽ‰πŸŽ‰ WavTokenizer owns rich semantic information and is build for audio language models such as GPT-4o.
26
 
27
  # πŸ”₯ News
28
+ - *2025.02.25*: We update WavTokenizer camera ready version for ICLR 2025 and update WavTokenizer-large-v2 checkpoint on [huggingface](https://huggingface.co/novateur/WavTokenizer-large-speech-75token).
29
+ - *2024.10.22*: We update WavTokenizer on arxiv and release WavTokenizer-Large checkpoint.
30
+ - *2024.09.09*: We release WavTokenizer-medium checkpoint on [huggingface](https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0).
31
+ - *2024.08.31*: We release WavTokenizer on arxiv.
32
 
33
  ![result](result.png)
34
 
 
116
  |:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------:|:----------:|:------:|
117
  | WavTokenizer-small-600-24k-4096 | [πŸ€—](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_600_24k_4096.ckpt) | LibriTTS | 40 | Speech | √ |
118
  | WavTokenizer-small-320-24k-4096 | [πŸ€—](https://huggingface.co/novateur/WavTokenizer/blob/main/WavTokenizer_small_320_24k_4096.ckpt) | LibriTTS | 75 | Speech | √|
119
+ | WavTokenizer-medium-320-24k-4096 | [πŸ€—](https://huggingface.co/collections/novateur/wavtokenizer-medium-large-66de94b6fd7d68a2933e4fc0) | 10000 Hours | 75 | Speech, Audio, Music | √ |
120
+ | WavTokenizer-large-600-24k-4096 | [πŸ€—](https://huggingface.co/novateur/WavTokenizer-large-unify-40token) | 80000 Hours | 40 | Speech, Audio, Music | √|
121
+ | WavTokenizer-large-320-24k-4096 | [πŸ€—](https://huggingface.co/novateur/WavTokenizer-large-speech-75token) | 80000 Hours | 75 | Speech, Audio, Music | √ |
 
122
 
123
 
124