georg-suno ylacombe commited on
Commit
8877ba1
Β·
1 Parent(s): a3f055a

Add transformers usage (#7)

Browse files

- Add transformers usage (d8ea9c90c084a1f46d532ac6238e5d1df177a78e)
- update README with examples (89d04b32c4f6f5abefa48e973a95a85ad9ce73ad)
- update with bark-small reference (ee9b2d6e28d1b5d5e7a150bf7bf70fae0dfdb05d)
- update Bark.generate_speech -> generate (0fb30c75ea4361fa4520550405ed6243360331f5)


Co-authored-by: Yoach Lacombe <ylacombe@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +88 -5
README.md CHANGED
@@ -36,9 +36,89 @@ This model is meant for research purposes only.
36
  The model output is not censored and the authors do not endorse the opinions in the generated content.
37
  Use at your own risk.
38
 
39
- The following is additional information about the models released here.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
- ## Model Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ```python
44
  from bark import SAMPLE_RATE, generate_audio, preload_models
@@ -52,10 +132,10 @@ text_prompt = """
52
  Hello, my name is Suno. And, uh β€” and I like pizza. [laughs]
53
  But I also have other interests such as playing tic tac toe.
54
  """
55
- audio_array = generate_audio(text_prompt)
56
 
57
  # play text in notebook
58
- Audio(audio_array, rate=SAMPLE_RATE)
59
  ```
60
 
61
  [pizza.webm](https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm)
@@ -71,6 +151,9 @@ write_wav("/path/to/audio.wav", SAMPLE_RATE, audio_array)
71
 
72
  ## Model Details
73
 
 
 
 
74
  Bark is a series of three transformer models that turn text into audio.
75
 
76
  ### Text to semantic tokens
@@ -102,4 +185,4 @@ We anticipate that this model's text to audio capabilities can be used to improv
102
  While we hope that this release will enable users to express their creativity and build applications that are a force
103
  for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
104
  to voice clone known people with Bark, it can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark,
105
- we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).
 
36
  The model output is not censored and the authors do not endorse the opinions in the generated content.
37
  Use at your own risk.
38
 
39
+ Two checkpoints are released:
40
+ - [small](https://huggingface.co/suno/bark-small)
41
+ - [**large** (this checkpoint)](https://huggingface.co/suno/bark)
42
+
43
+
44
+ ## Example
45
+
46
+ Try out Bark yourself!
47
+
48
+ * Bark Colab:
49
+
50
+ <a target="_blank" href="https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing">
51
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
52
+ </a>
53
+
54
+ * Hugging Face Colab:
55
+
56
+ <a target="_blank" href="https://colab.research.google.com/drive/1dWWkZzvu7L9Bunq9zvD-W02RFUXoW-Pd?usp=sharing">
57
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
58
+ </a>
59
+
60
+ * Hugging Face Demo:
61
+
62
+ <a target="_blank" href="https://huggingface.co/spaces/suno/bark">
63
+ <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
64
+ </a>
65
+
66
+
67
+ ## πŸ€— Transformers Usage
68
+
69
+
70
+ You can run Bark locally with the πŸ€— Transformers library from version 4.31.0 onwards.
71
+
72
+ 1. First install the πŸ€— [Transformers library](https://github.com/huggingface/transformers) from main:
73
+
74
+ ```
75
+ pip install git+https://github.com/huggingface/transformers.git
76
+ ```
77
+
78
+ 2. Run the following Python code to generate speech samples:
79
+
80
+ ```python
81
+ from transformers import AutoProcessor, AutoModel
82
+
83
 
84
+ processor = AutoProcessor.from_pretrained("suno/bark-small")
85
+ model = AutoModel.from_pretrained("suno/bark-small")
86
+
87
+ inputs = processor(
88
+ text=["Hello, my name is Suno. And, uh β€” and I like pizza. [laughs] But I also have other interests such as playing tic tac toe."],
89
+ return_tensors="pt",
90
+ )
91
+
92
+ speech_values = model.generate(**inputs, do_sample=True)
93
+ ```
94
+
95
+ 3. Listen to the speech samples either in an ipynb notebook:
96
+
97
+ ```python
98
+ from IPython.display import Audio
99
+
100
+ sampling_rate = model.generation_config.sample_rate
101
+ Audio(speech_values.cpu().numpy().squeeze(), rate=sampling_rate)
102
+ ```
103
+
104
+ Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
105
+
106
+ ```python
107
+ import scipy
108
+
109
+ sampling_rate = model.config.sample_rate
110
+ scipy.io.wavfile.write("bark_out.wav", rate=sampling_rate, data=speech_values.cpu().numpy().squeeze())
111
+ ```
112
+
113
+ For more details on using the Bark model for inference using the πŸ€— Transformers library, refer to the [Bark docs](https://huggingface.co/docs/transformers/model_doc/bark).
114
+
115
+ ## Suno Usage
116
+
117
+ You can also run Bark locally through the original [Bark library]((https://github.com/suno-ai/bark):
118
+
119
+ 1. First install the [`bark` library](https://github.com/suno-ai/bark)
120
+
121
+ 3. Run the following Python code:
122
 
123
  ```python
124
  from bark import SAMPLE_RATE, generate_audio, preload_models
 
132
  Hello, my name is Suno. And, uh β€” and I like pizza. [laughs]
133
  But I also have other interests such as playing tic tac toe.
134
  """
135
+ speech_array = generate_audio(text_prompt)
136
 
137
  # play text in notebook
138
+ Audio(speech_array, rate=SAMPLE_RATE)
139
  ```
140
 
141
  [pizza.webm](https://user-images.githubusercontent.com/5068315/230490503-417e688d-5115-4eee-9550-b46a2b465ee3.webm)
 
151
 
152
  ## Model Details
153
 
154
+
155
+ The following is additional information about the models released here.
156
+
157
  Bark is a series of three transformer models that turn text into audio.
158
 
159
  ### Text to semantic tokens
 
185
  While we hope that this release will enable users to express their creativity and build applications that are a force
186
  for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
187
  to voice clone known people with Bark, it can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark,
188
+ we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).