sander-wood commited on
Commit
11bfc95
1 Parent(s): a6d0216

Upload app.py

Browse files
Files changed (1) hide show
  1. app.py +20 -60
app.py CHANGED
@@ -1,5 +1,3 @@
1
- import subprocess
2
- import os
3
  import gradio as gr
4
  import json
5
  from utils import *
@@ -8,64 +6,27 @@ from transformers import AutoTokenizer
8
 
9
  description = """
10
  <div>
11
- <a style="display:inline-block" href='https://github.com/suno-ai/bark'><img src='https://img.shields.io/github/stars/suno-ai/bark?style=social' /></a>
12
- <a style='display:inline-block' href='https://discord.gg/J2B2vsjKuE'><img src='https://dcbadge.vercel.app/api/server/J2B2vsjKuE?compact=true&style=flat' /></a>
13
- <a style="display:inline-block; margin-left: 1em" href="https://huggingface.co/spaces/suno/bark?duplicate=true"><img src="https://img.shields.io/badge/-Duplicate%20Space%20to%20skip%20the%20queue-blue?labelColor=white&style=flat&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IArs4c6QAAAP5JREFUOE+lk7FqAkEURY+ltunEgFXS2sZGIbXfEPdLlnxJyDdYB62sbbUKpLbVNhyYFzbrrA74YJlh9r079973psed0cvUD4A+4HoCjsA85X0Dfn/RBLBgBDxnQPfAEJgBY+A9gALA4tcbamSzS4xq4FOQAJgCDwV2CPKV8tZAJcAjMMkUe1vX+U+SMhfAJEHasQIWmXNN3abzDwHUrgcRGmYcgKe0bxrblHEB4E/pndMazNpSZGcsZdBlYJcEL9Afo75molJyM2FxmPgmgPqlWNLGfwZGG6UiyEvLzHYDmoPkDDiNm9JR9uboiONcBXrpY1qmgs21x1QwyZcpvxt9NS09PlsPAAAAAElFTkSuQmCC&logoWidth=14" alt="Duplicate Space"></a>
 
14
  </div>
15
- Bark is a universal text-to-audio model created by [Suno](www.suno.ai), with code publicly available [here](https://github.com/suno-ai/bark). \
16
- Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. \
17
- This demo should be used for research purposes only. Commercial use is strictly prohibited. \
18
- The model output is not censored and the authors do not endorse the opinions in the generated content. \
19
- Use at your own risk.
20
- """
21
 
22
- article = """
23
- ## 🌎 Foreign Language
24
- Bark supports various languages out-of-the-box and automatically determines language from input text. \
25
- When prompted with code-switched text, Bark will even attempt to employ the native accent for the respective languages in the same voice.
26
- Try the prompt:
27
- ```
28
- Buenos días Miguel. Tu colega piensa que tu alemán es extremadamente malo. But I suppose your english isn't terrible.
29
- ```
30
- ## 🤭 Non-Speech Sounds
31
- Below is a list of some known non-speech sounds, but we are finding more every day. \
32
- Please let us know if you find patterns that work particularly well on Discord!
33
- * [laughter]
34
- * [laughs]
35
- * [sighs]
36
- * [music]
37
- * [gasps]
38
- * [clears throat]
39
- * — or ... for hesitations
40
- * ♪ for song lyrics
41
- * capitalization for emphasis of a word
42
- * MAN/WOMAN: for bias towards speaker
43
- Try the prompt:
44
- ```
45
- " [clears throat] Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as... ♪ singing ♪."
46
- ```
47
- ## 🎶 Music
48
- Bark can generate all types of audio, and, in principle, doesn't see a difference between speech and music. \
49
- Sometimes Bark chooses to generate text as music, but you can help it out by adding music notes around your lyrics.
50
- Try the prompt:
51
- ```
52
- ♪ In the jungle, the mighty jungle, the lion barks tonight ♪
53
- ```
54
- ## 🧬 Voice Cloning
55
- Bark has the capability to fully clone voices - including tone, pitch, emotion and prosody. \
56
- The model also attempts to preserve music, ambient noise, etc. from input audio. \
57
- However, to mitigate misuse of this technology, we limit the audio history prompts to a limited set of Suno-provided, fully synthetic options to choose from.
58
- ## 👥 Speaker Prompts
59
- You can provide certain speaker prompts such as NARRATOR, MAN, WOMAN, etc. \
60
- Please note that these are not always respected, especially if a conflicting audio history prompt is given.
61
- Try the prompt:
62
- ```
63
- WOMAN: I would like an oatmilk latte please.
64
- MAN: Wow, that's expensive!
65
- ```
66
- ## Details
67
- Bark model by [Suno](https://suno.ai/), including official [code](https://github.com/suno-ai/bark) and model weights. \
68
- Gradio demo supported by 🤗 Hugging Face. Bark is licensed under a non-commercial license: CC-BY 4.0 NC, see details on [GitHub](https://github.com/suno-ai/bark).
69
  """
70
  examples = [
71
  "Jazz standard in Minor key with a swing feel.",
@@ -190,7 +151,7 @@ def semantic_music_search(query):
190
  key_cache = torch.load(f)
191
 
192
  # encode query
193
- query_ids = encoding_data([query], QUERY_MODAL)
194
  query_feature = get_features(query_ids, QUERY_MODAL)
195
 
196
  key_filenames = key_cache["filenames"]
@@ -225,5 +186,4 @@ gr.Interface(
225
  outputs=[output_title, output_artist, output_genre, output_description, output_abc],
226
  title="🗜️ CLaMP: Semantic Music Search",
227
  description=description,
228
- article=article,
229
  examples=examples).launch()
 
 
 
1
  import gradio as gr
2
  import json
3
  from utils import *
 
6
 
7
  description = """
8
  <div>
9
+ <a style="display:inline-block" href='https://github.com/microsoft/muzic/tree/main/clamp'><img src='https://img.shields.io/github/stars/microsoft/muzic?style=social' /></a>
10
+ <a style='display:inline-block' href='https://ai-muzic.github.io/clamp/'><img src='https://img.shields.io/badge/website-CLaMP-ff69b4.svg' /></a>
11
+ <a style="display:inline-block" href="https://huggingface.co/datasets/sander-wood/wikimusictext"><img src="https://img.shields.io/badge/huggingface-dataset-ffcc66.svg"></a>
12
+ <a style="display:inline-block" href="https://arxiv.org/abs/2106.01955"><img src="https://img.shields.io/badge/arXiv-2106.01955-b31b1b.svg"></a>
13
  </div>
 
 
 
 
 
 
14
 
15
+ ## ℹ️ How to use this demo?
16
+ 1. Enter a query in the text box.
17
+ 2. Click "Submit" and wait for the result.
18
+ 3. It will return the most matching music score from the WikiMusictext dataset (1010 scores in total).
19
+
20
+ ## ❕Notice
21
+ - The text box is case-sensitive.
22
+ - You can enter longer text for the text box, but the demo will only use the first 128 tokens.
23
+ - The returned results include the title, artist, genre, description, and the score in ABC notation.
24
+ - The genre and description may not be accurate, as they are collected from the web.
25
+ - The demo is based on CLaMP-S/512, a CLaMP model with 6-layer Transformer text/music encoders and a sequence length of 512.
26
+
27
+ ## 🔠👉🎵 Semantic Music Search
28
+ Semantic search is a technique for retrieving music by open-domain queries, which differs from traditional keyword-based searches that depend on exact matches or meta-information. This involves two steps: 1) extracting music features from all scores in the library, and 2) transforming the query into a text feature. By calculating the similarities between the text feature and the music features, it can efficiently locate the score that best matches the user's query in the library.
29
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  """
31
  examples = [
32
  "Jazz standard in Minor key with a swing feel.",
 
151
  key_cache = torch.load(f)
152
 
153
  # encode query
154
+ query_ids = encoding_data([unidecode(query)], QUERY_MODAL)
155
  query_feature = get_features(query_ids, QUERY_MODAL)
156
 
157
  key_filenames = key_cache["filenames"]
 
186
  outputs=[output_title, output_artist, output_genre, output_description, output_abc],
187
  title="🗜️ CLaMP: Semantic Music Search",
188
  description=description,
 
189
  examples=examples).launch()