Jeremy Hummel commited on
Commit
f5a1e6e
·
1 Parent(s): ef3f526

Updates markdown

Browse files
Files changed (1) hide show
  1. app.py +33 -33
app.py CHANGED
@@ -33,41 +33,41 @@ network_choices = [
33
  ]
34
 
35
  description = \
36
- """
37
- Generate visualizations on an input audio file using [StyleGAN3](https://nvlabs.github.io/stylegan3/) (Karras, Tero, et al. "Alias-free generative adversarial networks." Advances in Neural Information Processing Systems 34 (2021): 852-863.).
38
- Inspired by [Deep Music Visualizer](https://github.com/msieg/deep-music-visualizer), which used BigGAN (Brock et al., 2018)
39
- Developed by Jeremy Hummel at [Lambda](https://lambdalabs.com/)
40
- """
41
 
42
  article = \
43
- """
44
- ## How does this work?
45
- The audio is transformed to a spectral representation by using Short-time Fourier transform (STFT). [librosa]()
46
- Starting with an initial noise vector, we perform a random walk, adjusting the length of each step with the power gradient.
47
- This pushes the noise vector to move around more when the sound changes.
48
-
49
- ## Parameter info:
50
- *Network*: various pre-trained models from NVIDIA, "afhqv2" is animals, "ffhq" is faces, "metfaces" is artwork.
51
-
52
- *Truncation*: controls how far the noise vector can be from the origin. `0.7` will generate more realistic, but less diverse samples,
53
- while `1.2` will can yield more interesting but less realistic images.
54
-
55
- *Tempo Sensitivity*: controls the how the size of each step scales with the audio features
56
-
57
- *Jitter*: prevents the same exact noise vectors from cycling repetitively, if set to `0`, the images will repeat during
58
- repetitive parts of the audio
59
-
60
- *Frame Length*: controls the number of audio frames per video frame in the output.
61
- If you want a higher frame rate for visualizing very rapid music, lower the frame length.
62
- If you want a lower frame rate (which will complete the job faster), raise the frame length
63
-
64
- *Max Duration*: controls the max length of the visualization, in seconds. Use a shorter value here to get output
65
- more quickly, especially for testing different combinations of parameters.
66
-
67
- Media sources:
68
- [Maple Leaf Rag - Scott Joplin (1916, public domain)](https://commons.wikimedia.org/wiki/File:Maple_leaf_rag_-_played_by_Scott_Joplin_1916_V2.ogg)
69
- [Moonlight Sonata Opus 27. no 2. - movement 3 - Ludwig van Beethoven, played by Muriel Nguyen Xuan (2008, CC BY-SA 3.0)](https://commons.wikimedia.org/wiki/File:Muriel-Nguyen-Xuan-Beethovens-Moonlight-Sonata-mvt-3.oga)
70
- """
71
 
72
  examples = [
73
  ["examples/Maple_leaf_rag_-_played_by_Scott_Joplin_1916_V2.ogg", network_choices[0], 1.0, 0.25, 0.5, 512, 600],
 
33
  ]
34
 
35
  description = \
36
+ """
37
+ Generate visualizations on an input audio file using [StyleGAN3](https://nvlabs.github.io/stylegan3/) (Karras, Tero, et al. "Alias-free generative adversarial networks." Advances in Neural Information Processing Systems 34 (2021): 852-863.).
38
+ Inspired by [Deep Music Visualizer](https://github.com/msieg/deep-music-visualizer), which used BigGAN (Brock et al., 2018)
39
+ Developed by Jeremy Hummel at [Lambda](https://lambdalabs.com/)
40
+ """
41
 
42
  article = \
43
+ """
44
+ ## How does this work?
45
+ The audio is transformed to a spectral representation by using Short-time Fourier transform (STFT). [librosa]()
46
+ Starting with an initial noise vector, we perform a random walk, adjusting the length of each step with the power gradient.
47
+ This pushes the noise vector to move around more when the sound changes.
48
+
49
+ ## Parameter info:
50
+ *Network*: various pre-trained models from NVIDIA, "afhqv2" is animals, "ffhq" is faces, "metfaces" is artwork.
51
+
52
+ *Truncation*: controls how far the noise vector can be from the origin. `0.7` will generate more realistic, but less diverse samples,
53
+ while `1.2` will can yield more interesting but less realistic images.
54
+
55
+ *Tempo Sensitivity*: controls the how the size of each step scales with the audio features
56
+
57
+ *Jitter*: prevents the same exact noise vectors from cycling repetitively, if set to `0`, the images will repeat during
58
+ repetitive parts of the audio
59
+
60
+ *Frame Length*: controls the number of audio frames per video frame in the output.
61
+ If you want a higher frame rate for visualizing very rapid music, lower the frame length.
62
+ If you want a lower frame rate (which will complete the job faster), raise the frame length
63
+
64
+ *Max Duration*: controls the max length of the visualization, in seconds. Use a shorter value here to get output
65
+ more quickly, especially for testing different combinations of parameters.
66
+ """
67
+ # Media sources:
68
+ # [Maple Leaf Rag - Scott Joplin (1916, public domain)](https://commons.wikimedia.org/wiki/File:Maple_leaf_rag_-_played_by_Scott_Joplin_1916_V2.ogg)
69
+ # [Moonlight Sonata Opus 27. no 2. - movement 3 - Ludwig van Beethoven, played by Muriel Nguyen Xuan (2008, CC BY-SA 3.0)](https://commons.wikimedia.org/wiki/File:Muriel-Nguyen-Xuan-Beethovens-Moonlight-Sonata-mvt-3.oga)
70
+ # """
71
 
72
  examples = [
73
  ["examples/Maple_leaf_rag_-_played_by_Scott_Joplin_1916_V2.ogg", network_choices[0], 1.0, 0.25, 0.5, 512, 600],