--- datasets: - lluccardoner/melodyGPT-song-chords-text-1 language: - en license: mit tags: - music --- # melodyGPT song chords tokenizer GPT2 tokenizer trained with a custom corpus of song chords: [lluccardoner/melodyGPT-song-chords-text-1](https://huggingface.co/datasets/lluccardoner/melodyGPT-song-chords-text-1). ## Model Details ### Model Description See [melodyGPT](https://github.com/lluccardoner/melodyGPT/blob/main/README.md) repository. - **Developed by:** Lluc Cardoner - **Model type:** GPT2 Tokenizer - **License:** You are free to use this tokenizer for your own projects ```python from transformers import AutoTokenizer gpt2_tokenizer = AutoTokenizer.from_pretrained("gpt2") chords_gpt2_tokenizer = AutoTokenizer.from_pretrained("lluccardoner/melodyGPT-song-chords-tokenizer-1") print(gpt2_tokenizer.vocab_size) # 50257 print(chords_gpt2_tokenizer.vocab_size) # 19972 example_chords = "Intro: Adim G7/13 Em Bb (4x) G#dim Bm/C F#m Ab|---------------------------------| (Bridge) C G Em7 Asus4" tokens = gpt2_tokenizer.tokenize(example_chords) # tokens: ['Int', 'ro', ':', 'ĠAd', 'im', 'ĠG', '7', '/', '13', 'ĠEm', 'ĠB', 'b', 'Ġ(', '4', 'x', ')', 'ĠG', '#', 'dim', 'ĠB', 'm', '/', 'C', 'ĠF', '#', 'm', 'ĠAb', '|', '--------------------------------', '-|', 'Ġ(', 'Bridge', ')', 'ĠC', 'ĠG', 'ĠEm', '7', 'ĠAsus', '4'] new_tokens = chords_gpt2_tokenizer.tokenize(example_chords) # new_tokens: ['Intro', ':', 'ĠAdim', 'ĠG', '7', '/', '13', 'ĠEm', 'ĠBb', 'Ġ(', '4', 'x', ')', 'ĠG', '#', 'dim', 'ĠBm', '/', 'C', 'ĠF', '#', 'm', 'ĠAb', '|---------------------------------|', 'Ġ(', 'Bridge', ')', 'ĠC', 'ĠG', 'ĠEm', '7', 'ĠAsus', '4'] ```