Initial push

Browse files

Files changed (7) hide show

README.md +61 -0
config.json +30 -0
merges.txt +0 -0
pytorch_model.bin +3 -0
special_tokens_map.json +1 -0
tokenizer_config.json +1 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,61 @@

+# 🎸 🥁 Rockbot 🎤 🎧
+A [GPT-2](https://openai.com/blog/better-language-models/) based lyrics generator fine-tuned on the writing styles of 16000 songs by 270 artists across MANY genres (not just rock).
+**Instructions:** Type in a fake song title, pick an artist, click "Generate".
+Most language models are imprecise and Rockbot is no exception. You may see NSFW lyrics unexpectedly. I have made no attempts to censor. Generated lyrics may be repetitive and/or incoherent at times, but hopefully you'll encounter something interesting or memorable.
+Oh, and generation is resource intense and can be slow. I set governors on song length to keep generation time somewhat reasonable. You may adjust song length and other parameters on the left or check out [Github](https://github.com/bigjoedata/rockbot) to spin up your own Rockbot.
+Just have fun.
+[Demo](https://share.streamlit.io/bigjoedata/rockbot/main/src/main.py) Adjust settings to increase speed
+[Github](https://github.com/bigjoedata/rockbot)
+[GPT-2 124M version Model page on Hugging Face](https://huggingface.co/bigjoedata/rockbot)
+[DistilGPT2 version Model page on Hugging Face](https://huggingface.co/bigjoedata/rockbot-distilgpt2/) This is leaner with the tradeoff being that the lyrics are more simplistic.
+🎹 🪘 🎷 🎺 🪗  🪕 🎻
+## Background
+With the shutdown of [Google Play Music](https://en.wikipedia.org/wiki/Google_Play_Music) I used Google's takeout function to gather the metadata from artists I've listened to over the past several years. I wanted to take advantage of this bounty to build something fun. I scraped the top 50 lyrics for artists I'd listened to at least once from [Genius](https://genius.com/), then fine tuned [GPT-2's](https://openai.com/blog/better-language-models/) 124M token model using the [AITextGen](https://github.com/minimaxir/aitextgen) framework after considerable post-processing. For more on generation, see [here.](https://huggingface.co/blog/how-to-generate)
+### Full Tech Stack
+[Google Play Music](https://en.wikipedia.org/wiki/Google_Play_Music)  (R.I.P.).
+[Python](https://www.python.org/).
+[Streamlit](https://www.streamlit.io/).
+[GPT-2](https://openai.com/blog/better-language-models/).
+[AITextGen](https://github.com/minimaxir/aitextgen).
+[Pandas](https://pandas.pydata.org/).
+[LyricsGenius](https://lyricsgenius.readthedocs.io/en/master/).
+[Google Colab](https://colab.research.google.com/) (GPU based Training).
+[Knime](https://www.knime.com/) (data cleaning).
+## How to Use The Model
+Please refer to [AITextGen](https://github.com/minimaxir/aitextgen) for much better documentation.
+### Training Parameters Used
+    ai.train("lyrics.txt",
+             line_by_line=False,
+             from_cache=False,
+             num_steps=10000,
+             generate_every=2000,
+             save_every=2000,
+             save_gdrive=False,
+             learning_rate=1e-3,
+             batch_size=3,
+             eos_token="<|endoftext|>",
+             #fp16=True
+             )
+###  To Use
+    Generate With Prompt (Use Title Case):
+    Song Name
+    BY
+    Artist Name

config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "_name_or_path": "aitextgen/pytorch_model_355M.bin",
+  "activation_function": "gelu_new",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "attn_pdrop": 0.1,
+  "bos_token_id": 50256,
+  "embd_pdrop": 0.1,
+  "eos_token_id": 50256,
+  "gradient_checkpointing": false,
+  "initializer_range": 0.02,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "gpt2",
+  "n_ctx": 1024,
+  "n_embd": 1024,
+  "n_head": 16,
+  "n_inner": null,
+  "n_layer": 24,
+  "n_positions": 1024,
+  "n_vocab": 50257,
+  "resid_pdrop": 0.1,
+  "summary_activation": null,
+  "summary_first_dropout": 0.1,
+  "summary_proj_to_labels": true,
+  "summary_type": "cls_index",
+  "summary_use_proj": true,
+  "use_cache": true,
+  "vocab_size": 50257
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa7a3384dfbabc90d3f74175185ad17efe4de7dfe30090852bc153724b493e41
+size 1444581811

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"bos_token": {"content": "<\|startoftext\|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "eos_token": {"content": "<\|endoftext\|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "unk_token": {"content": "<\|endoftext\|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, "pad_token": "<\|endoftext\|>"}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"errors": "replace", "unk_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "bos_token": {"content": "<|startoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "eos_token": {"content": "<|endoftext|>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "add_prefix_space": false, "pad_token": "<|endoftext|>"}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff