--- license: mit datasets: - Salesforce/wikitext language: - en --- This is a custom implementation of gpt2, where we replace attention with our implementation. Currently, we don't replace softmax, but in future submits we would like to replace the softmax function in attention with other softmax variations.