--- license: mit --- This is a encoder only Tranformer model with 43 Million parameters It was trained on around 4 Million tokens