This project pretrains a [`roberta-base`](https://huggingface.co/roberta-base) on the *Alemannic* (`als`) data subset of the [OSCAR](https://oscar-corpus.com/) corpus in JAX/Flax. We will be using the masked-language modeling loss for pretraining.