bigjoedata
commited on
Commit
β’
b2dcde1
1
Parent(s):
059af0e
Added artists trained longer
Browse files- README.md +51 -34
- config.json +2 -2
- merges.txt +0 -0
- pytorch_model.bin +2 -2
- vocab.json +0 -0
README.md
CHANGED
@@ -1,44 +1,61 @@
|
|
1 |
-
πΉ πͺ π· πΊ πͺ πͺ π»
|
2 |
-
## Rockbot Background
|
3 |
-
Two of my passions are music and data! I realized I had a bounty of metadata from artists I've listened to over the past several years and I decided to take advantage to build something fun. I scraped the top 50 lyrics for artists I'd listened to at least once from [Genius](https://genius.com/), added some other selected top artists, did a ton of post-processing and trained a [GPT-2's](https://openai.com/blog/better-language-models/) based model from scratch using the [AITextGen](https://github.com/minimaxir/aitextgen) framework. The UI / back end is built in [Streamlit](https://www.streamlit.io/) The vocabulary was built from scratch, rather than fine-tuned off an existing model. I also fine-tuned a GPT-2 based model available [here](https://huggingface.co/bigjoedata/rockbot) but this model weighs in at a fraction of the size.
|
4 |
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
-
|
8 |
-
- Removed duplicate lyrics from each song
|
9 |
-
- Deduped similar songs based on overall similarity to remove cover versions
|
10 |
-
- Removed as much noise / junk as possible. There is still some.
|
11 |
-
- Added tokens to delineate song
|
12 |
-
- Used language to remove non-English versions of songs
|
13 |
-
- Many others!
|
14 |
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
-
- [Python](https://www.python.org/).
|
18 |
-
- [Streamlit](https://www.streamlit.io/).
|
19 |
-
- [GPT-2](https://openai.com/blog/better-language-models/).
|
20 |
-
- [AITextGen](https://github.com/minimaxir/aitextgen).
|
21 |
-
- [LyricsGenius](https://lyricsgenius.readthedocs.io/en/master/) (retrieving lyrics for training).
|
22 |
-
- [Knime](https://www.knime.com/) (data cleaning and post processing)
|
23 |
-
- [GPT-2 generation](https://huggingface.co/blog/how-to-generate)
|
24 |
|
25 |
## How to Use The Model
|
26 |
-
Please refer to [AITextGen](https://github.com/minimaxir/aitextgen)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
-
Generate With Prompt (Use
|
29 |
Song Name
|
30 |
BY
|
31 |
-
Artist Name
|
32 |
-
Beginning of song
|
33 |
|
34 |
-
## Spin up your own with Docker
|
35 |
-
Running your own is very easy. Visit my [Streamlit-Plus repository](https://github.com/bigjoedata/streamlit-plus) for more details on the image build
|
36 |
-
|
37 |
-
- Install [Docker Compose](https://docs.docker.com/compose/install/)
|
38 |
-
- Follow the following steps
|
39 |
-
```
|
40 |
-
git clone https://github.com/bigjoedata/rockbot
|
41 |
-
cd rockbot
|
42 |
-
nano docker-compose.yml # Edit environmental variables for max song length and max songs to generate to match your computing power (higher is more resource intensive)
|
43 |
-
docker-compose up -d # launch in daemon (background) mode
|
44 |
-
```
|
|
|
|
|
|
|
1 |
|
2 |
+
# πΈ π₯ Rockbot π€ π§
|
3 |
+
A [GPT-2](https://openai.com/blog/better-language-models/) based lyrics generator fine-tuned on the writing styles of 16000 songs by 270 artists across MANY genres (not just rock).
|
4 |
+
|
5 |
+
**Instructions:** Type in a fake song title, pick an artist, click "Generate".
|
6 |
+
|
7 |
+
Most language models are imprecise and Rockbot is no exception. You may see NSFW lyrics unexpectedly. I have made no attempts to censor. Generated lyrics may be repetitive and/or incoherent at times, but hopefully you'll encounter something interesting or memorable.
|
8 |
+
|
9 |
+
Oh, and generation is resource intense and can be slow. I set governors on song length to keep generation time somewhat reasonable. You may adjust song length and other parameters on the left or check out [Github](https://github.com/bigjoedata/rockbot) to spin up your own Rockbot.
|
10 |
|
11 |
+
Just have fun.
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
+
[Demo](https://share.streamlit.io/bigjoedata/rockbot/main/src/main.py) Adjust settings to increase speed
|
14 |
+
|
15 |
+
[Github](https://github.com/bigjoedata/rockbot)
|
16 |
+
|
17 |
+
[GPT-2 124M version Model page on Hugging Face](https://huggingface.co/bigjoedata/rockbot)
|
18 |
+
|
19 |
+
[DistilGPT2 version Model page on Hugging Face](https://huggingface.co/bigjoedata/rockbot-distilgpt2/) This is leaner with the tradeoff being that the lyrics are more simplistic.
|
20 |
+
|
21 |
+
πΉ πͺ π· πΊ πͺ πͺ π»
|
22 |
+
## Background
|
23 |
+
With the shutdown of [Google Play Music](https://en.wikipedia.org/wiki/Google_Play_Music) I used Google's takeout function to gather the metadata from artists I've listened to over the past several years. I wanted to take advantage of this bounty to build something fun. I scraped the top 50 lyrics for artists I'd listened to at least once from [Genius](https://genius.com/), then fine tuned [GPT-2's](https://openai.com/blog/better-language-models/) 124M token model using the [AITextGen](https://github.com/minimaxir/aitextgen) framework after considerable post-processing. For more on generation, see [here.](https://huggingface.co/blog/how-to-generate)
|
24 |
+
|
25 |
+
### Full Tech Stack
|
26 |
+
[Google Play Music](https://en.wikipedia.org/wiki/Google_Play_Music) (R.I.P.).
|
27 |
+
[Python](https://www.python.org/).
|
28 |
+
[Streamlit](https://www.streamlit.io/).
|
29 |
+
[GPT-2](https://openai.com/blog/better-language-models/).
|
30 |
+
[AITextGen](https://github.com/minimaxir/aitextgen).
|
31 |
+
[Pandas](https://pandas.pydata.org/).
|
32 |
+
[LyricsGenius](https://lyricsgenius.readthedocs.io/en/master/).
|
33 |
+
[Google Colab](https://colab.research.google.com/) (GPU based Training).
|
34 |
+
[Knime](https://www.knime.com/) (data cleaning).
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
## How to Use The Model
|
38 |
+
Please refer to [AITextGen](https://github.com/minimaxir/aitextgen) for much better documentation.
|
39 |
+
|
40 |
+
### Training Parameters Used
|
41 |
+
|
42 |
+
ai.train("lyrics.txt",
|
43 |
+
line_by_line=False,
|
44 |
+
from_cache=False,
|
45 |
+
num_steps=10000,
|
46 |
+
generate_every=2000,
|
47 |
+
save_every=2000,
|
48 |
+
save_gdrive=False,
|
49 |
+
learning_rate=1e-3,
|
50 |
+
batch_size=3,
|
51 |
+
eos_token="<|endoftext|>",
|
52 |
+
#fp16=True
|
53 |
+
)
|
54 |
+
### To Use
|
55 |
+
|
56 |
|
57 |
+
Generate With Prompt (Use Title Case):
|
58 |
Song Name
|
59 |
BY
|
60 |
+
Artist Name
|
|
|
61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
@@ -23,7 +23,7 @@
|
|
23 |
"summary_proj_to_labels": true,
|
24 |
"summary_type": "cls_index",
|
25 |
"summary_use_proj": true,
|
26 |
-
"transformers_version": "4.
|
27 |
"use_cache": true,
|
28 |
-
"vocab_size":
|
29 |
}
|
23 |
"summary_proj_to_labels": true,
|
24 |
"summary_type": "cls_index",
|
25 |
"summary_use_proj": true,
|
26 |
+
"transformers_version": "4.3.2",
|
27 |
"use_cache": true,
|
28 |
+
"vocab_size": 75000
|
29 |
}
|
merges.txt
CHANGED
The diff for this file is too large to render.
See raw diff
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:588e5186b28450aa011770f686a8d86652a7e429e15460e56b7f03cc14690ede
|
3 |
+
size 104737544
|
vocab.json
CHANGED
The diff for this file is too large to render.
See raw diff
|