bigjoedata commited on
Commit
c58b005
β€’
1 Parent(s): 5e150da

Initial push

Browse files
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # 🎸 πŸ₯ Rockbot 🎀 🎧
3
+ A [GPT-2](https://openai.com/blog/better-language-models/) based lyrics generator fine-tuned on the writing styles of 16000 songs by 270 artists across MANY genres (not just rock).
4
+
5
+ **Instructions:** Type in a fake song title, pick an artist, click "Generate".
6
+
7
+ Most language models are imprecise and Rockbot is no exception. You may see NSFW lyrics unexpectedly. I have made no attempts to censor. Generated lyrics may be repetitive and/or incoherent at times, but hopefully you'll encounter something interesting or memorable.
8
+
9
+ Oh, and generation is resource intense and can be slow. I set governors on song length to keep generation time somewhat reasonable. You may adjust song length and other parameters on the left or check out [Github](https://github.com/bigjoedata/rockbot) to spin up your own Rockbot.
10
+
11
+ Just have fun.
12
+
13
+ [Demo](https://share.streamlit.io/bigjoedata/rockbot/main/src/main.py) Adjust settings to increase speed
14
+
15
+ [Github](https://github.com/bigjoedata/rockbot)
16
+
17
+ [GPT-2 124M version Model page on Hugging Face](https://huggingface.co/bigjoedata/rockbot)
18
+
19
+ [DistilGPT2 version Model page on Hugging Face](https://huggingface.co/bigjoedata/rockbot-distilgpt2/) This is leaner with the tradeoff being that the lyrics are more simplistic.
20
+
21
+ 🎹 πŸͺ˜ 🎷 🎺 πŸͺ— πŸͺ• 🎻
22
+ ## Background
23
+ With the shutdown of [Google Play Music](https://en.wikipedia.org/wiki/Google_Play_Music) I used Google's takeout function to gather the metadata from artists I've listened to over the past several years. I wanted to take advantage of this bounty to build something fun. I scraped the top 50 lyrics for artists I'd listened to at least once from [Genius](https://genius.com/), then fine tuned [GPT-2's](https://openai.com/blog/better-language-models/) 124M token model using the [AITextGen](https://github.com/minimaxir/aitextgen) framework after considerable post-processing. For more on generation, see [here.](https://huggingface.co/blog/how-to-generate)
24
+
25
+ ### Full Tech Stack
26
+ [Google Play Music](https://en.wikipedia.org/wiki/Google_Play_Music) (R.I.P.).
27
+ [Python](https://www.python.org/).
28
+ [Streamlit](https://www.streamlit.io/).
29
+ [GPT-2](https://openai.com/blog/better-language-models/).
30
+ [AITextGen](https://github.com/minimaxir/aitextgen).
31
+ [Pandas](https://pandas.pydata.org/).
32
+ [LyricsGenius](https://lyricsgenius.readthedocs.io/en/master/).
33
+ [Google Colab](https://colab.research.google.com/) (GPU based Training).
34
+ [Knime](https://www.knime.com/) (data cleaning).
35
+
36
+
37
+ ## How to Use The Model
38
+ Please refer to [AITextGen](https://github.com/minimaxir/aitextgen) for much better documentation.
39
+
40
+ ### Training Parameters Used
41
+
42
+ ai.train("lyrics.txt",
43
+ line_by_line=False,
44
+ from_cache=False,
45
+ num_steps=10000,
46
+ generate_every=2000,
47
+ save_every=2000,
48
+ save_gdrive=False,
49
+ learning_rate=1e-3,
50
+ batch_size=3,
51
+ eos_token="<|endoftext|>",
52
+ #fp16=True
53
+ )
54
+ ### To Use
55
+
56
+
57
+ Generate With Prompt (Use Title Case):
58
+ Song Name
59
+ BY
60
+ Artist Name
61
+
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "aitextgen/pytorch_model_355M.bin",
3
+ "activation_function": "gelu_new",
4
+ "architectures": [
5
+ "GPT2LMHeadModel"
6
+ ],
7
+ "attn_pdrop": 0.1,
8
+ "bos_token_id": 50256,
9
+ "embd_pdrop": 0.1,
10
+ "eos_token_id": 50256,
11
+ "gradient_checkpointing": false,
12
+ "initializer_range": 0.02,
13
+ "layer_norm_epsilon": 1e-05,
14
+ "model_type": "gpt2",
15
+ "n_ctx": 1024,
16
+ "n_embd": 1024,
17
+ "n_head": 16,
18
+ "n_inner": null,
19
+ "n_layer": 24,
20
+ "n_positions": 1024,
21
+ "n_vocab": 50257,
22
+ "resid_pdrop": 0.1,
23
+ "summary_activation": null,
24
+ "summary_first_dropout": 0.1,
25
+ "summary_proj_to_labels": true,
26
+ "summary_type": "cls_index",
27
+ "summary_use_proj": true,
28
+ "use_cache": true,
29
+ "vocab_size": 50257
30
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa7a3384dfbabc90d3f74175185ad17efe4de7dfe30090852bc153724b493e41
3
+ size 1444581811
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": "<|endoftext|>"}
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"errors": "replace", "unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "pad_token": "<|endoftext|>"}
vocab.json ADDED
The diff for this file is too large to render. See raw diff