21 Shakespeare: Combining two generation's most influential artists

My linguistic experiment. I wanted to combine two generation's most influential linguists to build 21 Shakspeare - the first bard with drip.

What?

NanoGPT as the base framework for training.

Data Processing

  1. Hugging Face 21 Savage Dataset (https://huggingface.co/datasets/huggingartists/21-savage)
  2. Hugging Face Tiny Shakespeare Dataset (https://huggingface.co/datasets/karpathy/tiny_shakespeare)
  3. Preprocessing -- cleaning up messy characters, removing structure of Shakespeare (speaker names), removing empty lines, creating one collective dataset for both artists, adding metadata tags for each artists

Next Steps

  • Trying to train GPT-2 for better results instead of nanoGPT. Compute is expensive though.
  • Re-generate the dataset so that I get a heavier use of Shakespearean language. It leans more towards 21st century.
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train mlekhi/21-shakespeare