rockbot-scratch / README.md
bigjoedata's picture
Added artists trained longer
b2dcde1
# 🎸 πŸ₯ Rockbot 🎀 🎧
A [GPT-2](https://openai.com/blog/better-language-models/) based lyrics generator fine-tuned on the writing styles of 16000 songs by 270 artists across MANY genres (not just rock).
**Instructions:** Type in a fake song title, pick an artist, click "Generate".
Most language models are imprecise and Rockbot is no exception. You may see NSFW lyrics unexpectedly. I have made no attempts to censor. Generated lyrics may be repetitive and/or incoherent at times, but hopefully you'll encounter something interesting or memorable.
Oh, and generation is resource intense and can be slow. I set governors on song length to keep generation time somewhat reasonable. You may adjust song length and other parameters on the left or check out [Github](https://github.com/bigjoedata/rockbot) to spin up your own Rockbot.
Just have fun.
[Demo](https://share.streamlit.io/bigjoedata/rockbot/main/src/main.py) Adjust settings to increase speed
[Github](https://github.com/bigjoedata/rockbot)
[GPT-2 124M version Model page on Hugging Face](https://huggingface.co/bigjoedata/rockbot)
[DistilGPT2 version Model page on Hugging Face](https://huggingface.co/bigjoedata/rockbot-distilgpt2/) This is leaner with the tradeoff being that the lyrics are more simplistic.
🎹 πŸͺ˜ 🎷 🎺 πŸͺ— πŸͺ• 🎻
## Background
With the shutdown of [Google Play Music](https://en.wikipedia.org/wiki/Google_Play_Music) I used Google's takeout function to gather the metadata from artists I've listened to over the past several years. I wanted to take advantage of this bounty to build something fun. I scraped the top 50 lyrics for artists I'd listened to at least once from [Genius](https://genius.com/), then fine tuned [GPT-2's](https://openai.com/blog/better-language-models/) 124M token model using the [AITextGen](https://github.com/minimaxir/aitextgen) framework after considerable post-processing. For more on generation, see [here.](https://huggingface.co/blog/how-to-generate)
### Full Tech Stack
[Google Play Music](https://en.wikipedia.org/wiki/Google_Play_Music) (R.I.P.).
[Python](https://www.python.org/).
[Streamlit](https://www.streamlit.io/).
[GPT-2](https://openai.com/blog/better-language-models/).
[AITextGen](https://github.com/minimaxir/aitextgen).
[Pandas](https://pandas.pydata.org/).
[LyricsGenius](https://lyricsgenius.readthedocs.io/en/master/).
[Google Colab](https://colab.research.google.com/) (GPU based Training).
[Knime](https://www.knime.com/) (data cleaning).
## How to Use The Model
Please refer to [AITextGen](https://github.com/minimaxir/aitextgen) for much better documentation.
### Training Parameters Used
ai.train("lyrics.txt",
line_by_line=False,
from_cache=False,
num_steps=10000,
generate_every=2000,
save_every=2000,
save_gdrive=False,
learning_rate=1e-3,
batch_size=3,
eos_token="<|endoftext|>",
#fp16=True
)
### To Use
Generate With Prompt (Use Title Case):
Song Name
BY
Artist Name