TODO gpt-code uses the weights and tokenizer of https://huggingface.co/Sentdex/GPyT as a starting point for pretraining