eastwind hubertsiuzdak commited on
Commit
e43b797
โ€ข
0 Parent(s):

Duplicate from hubertsiuzdak/snac_32khz

Browse files

Co-authored-by: Hubert Siuzdak <hubertsiuzdak@users.noreply.huggingface.co>

Files changed (4) hide show
  1. .gitattributes +35 -0
  2. README.md +71 -0
  3. config.json +13 -0
  4. pytorch_model.bin +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ ---
6
+
7
+ # SNAC ๐Ÿฟ
8
+
9
+ Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compressess audio into discrete codes at a low bitrate.
10
+
11
+ ๐Ÿ‘‰ This model was primarily trained on music data, and its recommended use case is music (and SFX) generation. See below for other pretrained models.
12
+
13
+ ๐Ÿ”— GitHub repository: https://github.com/hubertsiuzdak/snac/
14
+
15
+ ## Overview
16
+
17
+ SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently,
18
+ covering a broader time span.
19
+
20
+ This model compresses 32 kHz audio into discrete codes at a 1.9 kbps bitrate. It uses 4 RVQ levels with token rates of 10, 21, 42, and
21
+ 83 Hz.
22
+
23
+ ## Pretrained models
24
+
25
+ Currently, all models support only single audio channel (mono).
26
+
27
+ | Model | Bitrate | Sample Rate | Params | Recommended use case |
28
+ |-----------------------------------------------------------------------------|-----------|-------------|--------|--------------------------|
29
+ | [hubertsiuzdak/snac_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) | 0.98 kbps | 24 kHz | 19.8 M | ๐Ÿ—ฃ๏ธ Speech |
30
+ | hubertsiuzdak/snac_32khz (this model) | 1.9 kbps | 32 kHz | 54.5 M | ๐ŸŽธ Music / Sound Effects |
31
+ | [hubertsiuzdak/snac_44khz](https://huggingface.co/hubertsiuzdak/snac_44khz) | 2.6 kbps | 44 kHz | 54.5 M | ๐ŸŽธ Music / Sound Effects |
32
+
33
+ ## Usage
34
+
35
+ Install it using:
36
+
37
+ ```bash
38
+ pip install snac
39
+ ```
40
+ To encode (and decode) audio with SNAC in Python, use the following code:
41
+
42
+ ```python
43
+ import torch
44
+ from snac import SNAC
45
+
46
+ model = SNAC.from_pretrained("hubertsiuzdak/snac_32khz").eval().cuda()
47
+ audio = torch.randn(1, 1, 32000).cuda() # B, 1, T
48
+
49
+ with torch.inference_mode():
50
+ codes = model.encode(audio)
51
+ audio_hat = model.decode(codes)
52
+ ```
53
+
54
+ You can also encode and reconstruct in a single call:
55
+
56
+ ```python
57
+ with torch.inference_mode():
58
+ audio_hat, codes = model(audio)
59
+ ```
60
+
61
+ โš ๏ธ Note that `codes` is a list of token sequences of variable lengths, each corresponding to a different temporal
62
+ resolution.
63
+
64
+ ```
65
+ >>> [code.shape[1] for code in codes]
66
+ [12, 24, 48, 96]
67
+ ```
68
+
69
+ ## Acknowledgements
70
+
71
+ Module definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).
config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "sampling_rate": 32000,
3
+ "encoder_dim": 64,
4
+ "encoder_rates": [2, 3, 8, 8],
5
+ "decoder_dim": 1536,
6
+ "decoder_rates": [8, 8, 3, 2],
7
+ "attn_window_size": 32,
8
+ "codebook_size": 4096,
9
+ "codebook_dim": 8,
10
+ "vq_strides": [8, 4, 2, 1],
11
+ "noise": true,
12
+ "depthwise": true
13
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bfee2f057c1e287443786bedab377b5176b430e911417683977b7af71ea3ba65
3
+ size 218308802