Spaces:
Build error
Build error
Jeremy Hummel
commited on
Commit
·
f5a1e6e
1
Parent(s):
ef3f526
Updates markdown
Browse files
app.py
CHANGED
@@ -33,41 +33,41 @@ network_choices = [
|
|
33 |
]
|
34 |
|
35 |
description = \
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
|
42 |
article = \
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
|
72 |
examples = [
|
73 |
["examples/Maple_leaf_rag_-_played_by_Scott_Joplin_1916_V2.ogg", network_choices[0], 1.0, 0.25, 0.5, 512, 600],
|
|
|
33 |
]
|
34 |
|
35 |
description = \
|
36 |
+
"""
|
37 |
+
Generate visualizations on an input audio file using [StyleGAN3](https://nvlabs.github.io/stylegan3/) (Karras, Tero, et al. "Alias-free generative adversarial networks." Advances in Neural Information Processing Systems 34 (2021): 852-863.).
|
38 |
+
Inspired by [Deep Music Visualizer](https://github.com/msieg/deep-music-visualizer), which used BigGAN (Brock et al., 2018)
|
39 |
+
Developed by Jeremy Hummel at [Lambda](https://lambdalabs.com/)
|
40 |
+
"""
|
41 |
|
42 |
article = \
|
43 |
+
"""
|
44 |
+
## How does this work?
|
45 |
+
The audio is transformed to a spectral representation by using Short-time Fourier transform (STFT). [librosa]()
|
46 |
+
Starting with an initial noise vector, we perform a random walk, adjusting the length of each step with the power gradient.
|
47 |
+
This pushes the noise vector to move around more when the sound changes.
|
48 |
+
|
49 |
+
## Parameter info:
|
50 |
+
*Network*: various pre-trained models from NVIDIA, "afhqv2" is animals, "ffhq" is faces, "metfaces" is artwork.
|
51 |
+
|
52 |
+
*Truncation*: controls how far the noise vector can be from the origin. `0.7` will generate more realistic, but less diverse samples,
|
53 |
+
while `1.2` will can yield more interesting but less realistic images.
|
54 |
+
|
55 |
+
*Tempo Sensitivity*: controls the how the size of each step scales with the audio features
|
56 |
+
|
57 |
+
*Jitter*: prevents the same exact noise vectors from cycling repetitively, if set to `0`, the images will repeat during
|
58 |
+
repetitive parts of the audio
|
59 |
+
|
60 |
+
*Frame Length*: controls the number of audio frames per video frame in the output.
|
61 |
+
If you want a higher frame rate for visualizing very rapid music, lower the frame length.
|
62 |
+
If you want a lower frame rate (which will complete the job faster), raise the frame length
|
63 |
+
|
64 |
+
*Max Duration*: controls the max length of the visualization, in seconds. Use a shorter value here to get output
|
65 |
+
more quickly, especially for testing different combinations of parameters.
|
66 |
+
"""
|
67 |
+
# Media sources:
|
68 |
+
# [Maple Leaf Rag - Scott Joplin (1916, public domain)](https://commons.wikimedia.org/wiki/File:Maple_leaf_rag_-_played_by_Scott_Joplin_1916_V2.ogg)
|
69 |
+
# [Moonlight Sonata Opus 27. no 2. - movement 3 - Ludwig van Beethoven, played by Muriel Nguyen Xuan (2008, CC BY-SA 3.0)](https://commons.wikimedia.org/wiki/File:Muriel-Nguyen-Xuan-Beethovens-Moonlight-Sonata-mvt-3.oga)
|
70 |
+
# """
|
71 |
|
72 |
examples = [
|
73 |
["examples/Maple_leaf_rag_-_played_by_Scott_Joplin_1916_V2.ogg", network_choices[0], 1.0, 0.25, 0.5, 512, 600],
|