not trained yet, will on span corruption task
114,656,337 non-embedding parameters
40,567,296 embedding parameters
155,223,633 total parameters