ylacombe HF staff commited on
Commit
2a4cf88
1 Parent(s): a28135b

5d2ff9da9129ac77df392b9eea785b24c0b6fba741b3bfa7514283a870a21f55

Browse files
Files changed (50) hide show
  1. speaker_embeddings/pt_speaker_9.npz +3 -0
  2. speaker_embeddings/readme.md +30 -0
  3. speaker_embeddings/ru_speaker_0.npz +3 -0
  4. speaker_embeddings/ru_speaker_1.npz +3 -0
  5. speaker_embeddings/ru_speaker_2.npz +3 -0
  6. speaker_embeddings/ru_speaker_3.npz +3 -0
  7. speaker_embeddings/ru_speaker_4.npz +3 -0
  8. speaker_embeddings/ru_speaker_5.npz +3 -0
  9. speaker_embeddings/ru_speaker_6.npz +3 -0
  10. speaker_embeddings/ru_speaker_7.npz +3 -0
  11. speaker_embeddings/ru_speaker_8.npz +3 -0
  12. speaker_embeddings/ru_speaker_9.npz +3 -0
  13. speaker_embeddings/speaker_0.npz +3 -0
  14. speaker_embeddings/speaker_1.npz +3 -0
  15. speaker_embeddings/speaker_2.npz +3 -0
  16. speaker_embeddings/speaker_3.npz +3 -0
  17. speaker_embeddings/speaker_4.npz +3 -0
  18. speaker_embeddings/speaker_5.npz +3 -0
  19. speaker_embeddings/speaker_6.npz +3 -0
  20. speaker_embeddings/speaker_7.npz +3 -0
  21. speaker_embeddings/speaker_8.npz +3 -0
  22. speaker_embeddings/speaker_9.npz +3 -0
  23. speaker_embeddings/tr_speaker_0.npz +3 -0
  24. speaker_embeddings/tr_speaker_1.npz +3 -0
  25. speaker_embeddings/tr_speaker_2.npz +3 -0
  26. speaker_embeddings/tr_speaker_3.npz +3 -0
  27. speaker_embeddings/tr_speaker_4.npz +3 -0
  28. speaker_embeddings/tr_speaker_5.npz +3 -0
  29. speaker_embeddings/tr_speaker_6.npz +3 -0
  30. speaker_embeddings/tr_speaker_7.npz +3 -0
  31. speaker_embeddings/tr_speaker_8.npz +3 -0
  32. speaker_embeddings/tr_speaker_9.npz +3 -0
  33. speaker_embeddings/v2/de_speaker_0.npz +3 -0
  34. speaker_embeddings/v2/de_speaker_1.npz +3 -0
  35. speaker_embeddings/v2/de_speaker_2.npz +3 -0
  36. speaker_embeddings/v2/de_speaker_3.npz +3 -0
  37. speaker_embeddings/v2/de_speaker_4.npz +3 -0
  38. speaker_embeddings/v2/de_speaker_5.npz +3 -0
  39. speaker_embeddings/v2/de_speaker_6.npz +3 -0
  40. speaker_embeddings/v2/de_speaker_7.npz +3 -0
  41. speaker_embeddings/v2/de_speaker_8.npz +3 -0
  42. speaker_embeddings/v2/de_speaker_9.npz +3 -0
  43. speaker_embeddings/v2/en_speaker_0.npz +3 -0
  44. speaker_embeddings/v2/en_speaker_1.npz +3 -0
  45. speaker_embeddings/v2/en_speaker_2.npz +3 -0
  46. speaker_embeddings/v2/en_speaker_3.npz +3 -0
  47. speaker_embeddings/v2/en_speaker_4.npz +3 -0
  48. speaker_embeddings/v2/en_speaker_5.npz +3 -0
  49. speaker_embeddings/v2/en_speaker_6.npz +3 -0
  50. speaker_embeddings/v2/en_speaker_7.npz +3 -0
speaker_embeddings/pt_speaker_9.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21724f6ac25a3aef785875a0cbd98c6d72fabd9e60aa982a8afa6608b59388ae
3
+ size 58652
speaker_embeddings/readme.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Example Prompts Data
2
+
3
+ ## Version Two
4
+ The `v2` prompts are better engineered to follow text with a consistent voice.
5
+ To use them, simply include `v2` in the prompt. For example
6
+ ```python
7
+ from bark import generate_audio
8
+ text_prompt = "madam I'm adam"
9
+ audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_1")
10
+ ```
11
+
12
+ ## Prompt Format
13
+ The provided data is in the .npz format, which is a file format used in Python for storing arrays and data. The data contains three arrays: semantic_prompt, coarse_prompt, and fine_prompt.
14
+
15
+ ```semantic_prompt```
16
+
17
+ The semantic_prompt array contains a sequence of token IDs generated by the BERT tokenizer from Hugging Face. These tokens encode the text input and are used as an input to generate the audio output. The shape of this array is (n,), where n is the number of tokens in the input text.
18
+
19
+ ```coarse_prompt```
20
+
21
+ The coarse_prompt array is an intermediate output of the text-to-speech pipeline, and contains token IDs generated by the first two codebooks of the EnCodec Codec from Facebook. This step converts the semantic tokens into a different representation that is better suited for the subsequent step. The shape of this array is (2, m), where m is the number of tokens after conversion by the EnCodec Codec.
22
+
23
+ ```fine_prompt```
24
+
25
+ The fine_prompt array is a further processed output of the pipeline, and contains 8 codebooks from the EnCodec Codec. These codebooks represent the final stage of tokenization, and the resulting tokens are used to generate the audio output. The shape of this array is (8, p), where p is the number of tokens after further processing by the EnCodec Codec.
26
+
27
+ Overall, these arrays represent different stages of a text-to-speech pipeline that converts text input into synthesized audio output. The semantic_prompt array represents the input text, while coarse_prompt and fine_prompt represent intermediate and final stages of tokenization, respectively.
28
+
29
+
30
+
speaker_embeddings/ru_speaker_0.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f832edfe62de54ab56cd09862af428927f8e82ddbc371365c6a19db3b4fc1ab6
3
+ size 57852
speaker_embeddings/ru_speaker_1.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c72519f5060c8896d8131671e047e347614f555471a5f30da76fbc77acb5e8ee
3
+ size 24260
speaker_embeddings/ru_speaker_2.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87db3f55be72596b53afc7d2166ae38a1b6e5ba04880f4c392ff662ff14e41f4
3
+ size 51668
speaker_embeddings/ru_speaker_3.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b68d63b8ae2d46b68a67be76ef4cb7823c7640f2e855da05648bdea9a7c0871b
3
+ size 29164
speaker_embeddings/ru_speaker_4.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c15a3c0cb477b01ab4baecadc9781a398c9e82e1db6cc00f98c78d165af0e6b
3
+ size 27940
speaker_embeddings/ru_speaker_5.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3bf201c1ea1ea44c77c0264f33dfeeee99d27d02498e77f23d63a56de4ebdeeb
3
+ size 23356
speaker_embeddings/ru_speaker_6.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a048c4676d46fbc86492813145e018ecf8790f00153e69bf080926f2a5ba594e
3
+ size 45748
speaker_embeddings/ru_speaker_7.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16078f0e920479b090000cba9fe6cd47be53f8ced6441ad0452267dd5b170870
3
+ size 25380
speaker_embeddings/ru_speaker_8.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4c8abbf2a202ccbce4f569233f13adad49cbec45dc9f5029c1e357882c4dbc7
3
+ size 42924
speaker_embeddings/ru_speaker_9.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:756000ceb9eea65fa8a257cdee25ba7ec03e2c653c3d5913e0082540811f791d
3
+ size 38500
speaker_embeddings/speaker_0.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55bc30061b5c5928454e4c7a1d6206e359a25ca38fec3ca96de0a625fa96c572
3
+ size 19620
speaker_embeddings/speaker_1.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d5d5531998bd91684806eb64a2ac659d8c242f4112d6216697d3cae0b99b978
3
+ size 21380
speaker_embeddings/speaker_2.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3001ff8a04e64e0687b0ad145c92684c8758ce7af68fb330dcfee4739fd896b
3
+ size 19460
speaker_embeddings/speaker_3.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08b20f307ff4a1e5a947f4394ce2f2c3c5e0e6a9f78e0fd77604fb08359ab90d
3
+ size 32740
speaker_embeddings/speaker_4.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b6acddfa41ce84e558e09e91fae5fbb01704bc1cef0f000bcc7f30d05e51afc
3
+ size 19676
speaker_embeddings/speaker_5.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:048c7362b237c43ceb0c3a4986b5c42c21ef013cadaf7c77b6348419f801dc93
3
+ size 54548
speaker_embeddings/speaker_6.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d7359be4a984930a81103043409b695e383d493f4edd6d4786537b1730a95c0
3
+ size 23516
speaker_embeddings/speaker_7.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:560ccbd20b16a2313cdc44ed578c8fb4dcbe51c2d1c57756dc242d185a6b88d3
3
+ size 22556
speaker_embeddings/speaker_8.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26eb3e2589f21f88aa963f052cc5134c6510b1cdb0033be277733bc7dc77157c
3
+ size 20580
speaker_embeddings/speaker_9.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15ab7bbb47bf326e454cc1d299f4069d0fa9ea8e934273dbed4cbf1116404322
3
+ size 18396
speaker_embeddings/tr_speaker_0.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21c8f2c4e8b31b0a11c1565ba5ee104e11db4e3f83c6d8b44d52385692322d3b
3
+ size 26020
speaker_embeddings/tr_speaker_1.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:578ca20688ff603c6365a9f53076300cd17dec784532b4bb2e75de8a25f4781c
3
+ size 24156
speaker_embeddings/tr_speaker_2.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97013a34b28feb95881e5bcd4bea53e81acfcb5c4c896a6733e2a5e351242e6c
3
+ size 32740
speaker_embeddings/tr_speaker_3.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:117cd3cf2367f009d86849c75f85709cd227628b6b26ce7074b6196c2bb12132
3
+ size 20100
speaker_embeddings/tr_speaker_4.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:102a94642852e99a171875a53a3f219196407d75bbec62191dcd3bd542aa9c64
3
+ size 16100
speaker_embeddings/tr_speaker_5.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c61bf6adc04f81f1a5fbcbac1c0257fd76769f122ab083f16b3e29e2a7eeae7a
3
+ size 29220
speaker_embeddings/tr_speaker_6.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6250cb26f5b4563e9be8e5ae24f3c3af386c5b52ea21ad99237edc08296e3b6d
3
+ size 21596
speaker_embeddings/tr_speaker_7.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1eeca993d97dd24a1115494872c062c35297f462270ed4062f3158b0f8af08ac
3
+ size 21276
speaker_embeddings/tr_speaker_8.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cba4b642845725653e6d18b55b892208b33878e3914daeb1bd86e9c2d6383e33
3
+ size 35724
speaker_embeddings/tr_speaker_9.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d493f6328ba149b7680e55cd6a9b7419b88df871981040a0cc4a51493b210b6
3
+ size 19460
speaker_embeddings/v2/de_speaker_0.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82c8d443f71a46bca90e9323e0fd14c8beaaa55dbc690eb14b75b6b14497005a
3
+ size 39620
speaker_embeddings/v2/de_speaker_1.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed1c4324e3f989d484d7ed433efa2082f87c845f8688e18417624210e979335d
3
+ size 27460
speaker_embeddings/v2/de_speaker_2.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90f08869a1377c86ec525f0ad7aed10b4dcf7d75717b47a34d4b677d7e33e921
3
+ size 24740
speaker_embeddings/v2/de_speaker_3.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7da1e9457f8a5e082988652f202a5cc5320ac362f81ecfce5b1ce6edce2342d1
3
+ size 31300
speaker_embeddings/v2/de_speaker_4.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c53c565cedaa683bc2bf5577c0ad70c4d435b66055641234857dff5e743b2b5a
3
+ size 30660
speaker_embeddings/v2/de_speaker_5.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a798e48483c89702c316478336939a5c5a579cb0dd9e76943eca1ece914e3bdc
3
+ size 31300
speaker_embeddings/v2/de_speaker_6.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d668bf7735343ca059cfc35c0d796a422cb05ae6172a244dfd7320958943304
3
+ size 23196
speaker_embeddings/v2/de_speaker_7.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7aec16132be2de475b9d8889ff281cce60efa06f7910f8d2701aac75d119d9b4
3
+ size 40100
speaker_embeddings/v2/de_speaker_8.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a81d5f6c95d347cc269679bc41bf5dc50fe644e01b472985f6dd46c9b578937
3
+ size 28524
speaker_embeddings/v2/de_speaker_9.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73667c2a678d5264d583085772297aa7451c20d48286dba57ebc43d78767de38
3
+ size 51084
speaker_embeddings/v2/en_speaker_0.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:932f40d879ba8659f1ca26319ba64ea3b0647b2050fe24313bf42b0dff1fe241
3
+ size 28100
speaker_embeddings/v2/en_speaker_1.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e7f18015e1ab9b6302ded1e28a971af5306a72f193bb6c411f1948a083c8578
3
+ size 25220
speaker_embeddings/v2/en_speaker_2.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d218990680ece5f2d4fc18ea4783b016b3ae353ec413eaee2058f2d57263c9b3
3
+ size 26236
speaker_embeddings/v2/en_speaker_3.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92c2e2a29145c83738e9b63f082fd1c873d9422468a155463cb27f814aeaea66
3
+ size 34980
speaker_embeddings/v2/en_speaker_4.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:992f91991a9a5359d72f00b09a11a550e71bb8ebfc0cfd877e39d7d41f98b714
3
+ size 23780
speaker_embeddings/v2/en_speaker_5.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18831c3f6014e4a2ff60ad5169b1fae06e28ed07f43f8a3616aafb84515091bf
3
+ size 24740
speaker_embeddings/v2/en_speaker_6.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fab38dc6b6bc9226bcc414f4c5a9524bc1b2441865a586153fb620127a8faa4e
3
+ size 25540
speaker_embeddings/v2/en_speaker_7.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f4c4eb33f5994be8de5cfd1744ebce13da1618a6da3a7d244514178c61ef7db
3
+ size 22716