gngpostalsrvc
/

BERiT

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

gngpostalsrvc commited on Jul 26, 2023

Commit

17dc236

·

1 Parent(s): b2b6010

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ It achieves the following results on the evaluation set:
 ## Model description
-BERiT is a masked-language model for Biblical Hebrew, a low-resource ancient language preserved primarily in the text of the Hebrew Bible. Building on the work of [Sennrich and Zhang (2019)](https://arxiv.org/abs/1905.11901) and [Wodiak (2021)](https://arxiv.org/abs/2110.01938) on low-resource machine translation, it employs a modified version of the encoder block from Wodiak’s Seq2Seq model. Accordingly, BERiT is much smaller than models designed for modern languages like English. It features a single attention block with four attention heads, smaller embedding and feedforward dimensions (256 and 1024), a smaller max input length (128), and an aggressive dropout rate (.5) at both the attention and feedforward layers.
 The BERiT tokenizer performs character level byte-pair encoding using a 2000 word base vocabulary, which has been enriched with common grammatical morphemes.

 ## Model description
+BERiT is a masked-language model for Biblical Hebrew, a low-resource ancient language preserved primarily in the text of the Hebrew Bible. Building on the work of [Sennrich and Zhang (2019)](https://arxiv.org/abs/1905.11901) and [Wdowiak (2021)](https://arxiv.org/abs/2110.01938) on low-resource machine translation, it employs a modified version of the encoder block from Wdowiak’s Seq2Seq model. Accordingly, BERiT is much smaller than models designed for modern languages like English. It features a single attention block with four attention heads, smaller embedding and feedforward dimensions (256 and 1024), a smaller max input length (128), and an aggressive dropout rate (.5) at both the attention and feedforward layers.
 The BERiT tokenizer performs character level byte-pair encoding using a 2000 word base vocabulary, which has been enriched with common grammatical morphemes.