not trained yet, will on span corruption task
196,697,425 non-embedding parameters
48,703,872 embedding parameters
245,401,297 total parameters