monsoon-nlp commited on
Commit
a9fee4d
1 Parent(s): 0803e25
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -7,6 +7,8 @@ language: dv
7
  Pretrained from scratch on Dhivei (language of the Maldives)
8
  with ByT5, Google's new byte-level tokenizer strategy.
9
 
 
 
10
  Corpus: Sofwath's Dhivehi corpus https://github.com/Sofwath/DhivehiDatasets
11
 
12
  Pretraining Notebook:
@@ -17,3 +19,7 @@ https://colab.research.google.com/drive/1ERIZ1PyHn-yN_jo7dTQeODn22vrt-d1d?usp=sh
17
  On Dhivehi news classification task
18
 
19
  https://colab.research.google.com/drive/11u5SafR4bKICmArgDl6KQ9vqfYtDpyWp?usp=sharing
 
 
 
 
 
7
  Pretrained from scratch on Dhivei (language of the Maldives)
8
  with ByT5, Google's new byte-level tokenizer strategy.
9
 
10
+ **Use byt5-dv for now; this is less accurate**
11
+
12
  Corpus: Sofwath's Dhivehi corpus https://github.com/Sofwath/DhivehiDatasets
13
 
14
  Pretraining Notebook:
 
19
  On Dhivehi news classification task
20
 
21
  https://colab.research.google.com/drive/11u5SafR4bKICmArgDl6KQ9vqfYtDpyWp?usp=sharing
22
+
23
+ ## Issues
24
+
25
+ There was an issue with the vocabulary size, final layer, and/or accuracy on fine-tuning.