GPT model pre-training step on Shakespeare dataset
Train and perform inference on MNIST dataset
Text Tokenization using Byte-Pair Encoding (BPE)