j-csc commited on
Commit
568e35d
1 Parent(s): a6af09a

Add details from Suno

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md CHANGED
@@ -33,3 +33,42 @@ huggingface-cli download --local-dir-use-symlinks False --local-dir weights/ mlx
33
 
34
  # Run example (large model)
35
  python model.py --text="Hello world!" --path weights/ --model large
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  # Run example (large model)
35
  python model.py --text="Hello world!" --path weights/ --model large
36
+ ```
37
+ The rest of the model card was copied from [the original Bark repository](https://huggingface.co/suno/bark)
38
+
39
+ ## Model Details
40
+
41
+ The following is additional information about the models released here.
42
+
43
+ Bark is a series of three transformer models that turn text into audio.
44
+
45
+ ### Text to semantic tokens
46
+ - Input: text, tokenized with [BERT tokenizer from Hugging Face](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer)
47
+ - Output: semantic tokens that encode the audio to be generated
48
+
49
+ ### Semantic to coarse tokens
50
+ - Input: semantic tokens
51
+ - Output: tokens from the first two codebooks of the [EnCodec Codec](https://github.com/facebookresearch/encodec) from facebook
52
+
53
+ ### Coarse to fine tokens
54
+ - Input: the first two codebooks from EnCodec
55
+ - Output: 8 codebooks from EnCodec
56
+
57
+ ### Architecture
58
+ | Model | Parameters | Attention | Output Vocab size |
59
+ |:-------------------------:|:----------:|------------|:-----------------:|
60
+ | Text to semantic tokens | 80/300 M | Causal | 10,000 |
61
+ | Semantic to coarse tokens | 80/300 M | Causal | 2x 1,024 |
62
+ | Coarse to fine tokens | 80/300 M | Non-causal | 6x 1,024 |
63
+
64
+
65
+ ### Release date
66
+ April 2023
67
+
68
+ ## Broader Implications
69
+ We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages.
70
+
71
+ While we hope that this release will enable users to express their creativity and build applications that are a force
72
+ for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
73
+ to voice clone known people with Bark, it can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark,
74
+ we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).