lgessler commited on
Commit
d86fda3
1 Parent(s): 05b6aae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -1,7 +1,14 @@
1
  ---
2
  language: cop
3
- widget:
4
- - text: "ⲁⲩⲱ ⲉⲓⲥ ⲡⲉⲧⲙⲙⲁⲩ ⲁϥⲉⲓ ⲉϥⲣⲓⲙⲉ."
5
  ---
6
 
7
- A small `BertModel` for Coptic.
 
 
 
 
 
 
 
 
1
  ---
2
  language: cop
3
+ widget:
4
+ - text: ⲁⲗⲗⲁ ⲁⲛⲟⲕ ⲁⲓⲥⲉⲧⲡⲧⲏⲩⲧⲛ ·
5
  ---
6
 
7
+ This is a [MicroBERT](https://github.com/lgessler/microbert) model for Coptic.
8
+
9
+ * Its suffix is **-mx**, which means that it was pretrained using supervision from masked language modeling and XPOS tagging.
10
+ * The unlabeled Coptic data was taken from version 4.2.0 of the [Coptic SCRIPTORIUM corpus](https://github.com/copticscriptorium/corpora), totaling 970,642 tokens.
11
+ * The UD treebank [UD_Coptic_Scriptorium](https://github.com/UniversalDependencies/UD_Coptic-Scriptorium), v2.9, totaling 48,632 tokens, was used for labeled data.
12
+
13
+ Please see [the repository](https://github.com/lgessler/microbert) and
14
+ [the paper](https://github.com/lgessler/microbert/raw/master/MicroBERT__MRL_2022_.pdf) for more details.