--- language: dv --- # byt5-dv Pretrained from scratch on Dhivei (language of the Maldives) with ByT5, Google's new byte-level tokenizer strategy. Corpus: dv.wikipedia.org as of March 2020 (TFDS) Notebook: https://colab.research.google.com/drive/19Afq7CI6cOi1DaTpnQhBbEbnBzLSFHbH ## Demo ## Todos The Wikipedia corpus is too small for this language. In the future I would add OSCAR and Sofwath's Maldivian corpus, if I can rewrite the script to accept those as one TFDS dataset.