Edit model card

melodyGPT song chords tokenizer

GPT2 tokenizer trained with a custom corpus of song chords: lluccardoner/melodyGPT-song-chords-text-1.

Model Details

Model Description

See melodyGPT repository.

  • Developed by: Lluc Cardoner
  • Model type: GPT2 Tokenizer
  • License: You are free to use this tokenizer for your own projects
from transformers import AutoTokenizer

gpt2_tokenizer = AutoTokenizer.from_pretrained("gpt2")
chords_gpt2_tokenizer = AutoTokenizer.from_pretrained("lluccardoner/melodyGPT-song-chords-tokenizer-1")

print(gpt2_tokenizer.vocab_size) # 50257
print(chords_gpt2_tokenizer.vocab_size) # 19972 

example_chords = "Intro: Adim G7/13 Em Bb (4x) G#dim Bm/C F#m Ab|---------------------------------| (Bridge) C G Em7 Asus4"

tokens = gpt2_tokenizer.tokenize(example_chords)
# tokens: ['Int', 'ro', ':', 'ĠAd', 'im', 'ĠG', '7', '/', '13', 'ĠEm', 'ĠB', 'b', 'Ġ(', '4', 'x', ')', 'ĠG', '#', 'dim', 'ĠB', 'm', '/', 'C', 'ĠF', '#', 'm', 'ĠAb', '|', '--------------------------------', '-|', 'Ġ(', 'Bridge', ')', 'ĠC', 'ĠG', 'ĠEm', '7', 'ĠAsus', '4']

new_tokens = chords_gpt2_tokenizer.tokenize(example_chords)
# new_tokens: ['Intro', ':', 'ĠAdim', 'ĠG', '7', '/', '13', 'ĠEm', 'ĠBb', 'Ġ(', '4', 'x', ')', 'ĠG', '#', 'dim', 'ĠBm', '/', 'C', 'ĠF', '#', 'm', 'ĠAb', '|---------------------------------|', 'Ġ(', 'Bridge', ')', 'ĠC', 'ĠG', 'ĠEm', '7', 'ĠAsus', '4']
Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .

Dataset used to train lluccardoner/melodyGPT-song-chords-tokenizer-1