julien-c HF staff commited on
Commit
d0b8f7d
1 Parent(s): 5980521

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/keshan/SinhalaBERTo/README.md

Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: si
3
+ tags:
4
+ - SinhalaBERTo
5
+ - Sinhala
6
+ - roberta
7
+ datasets:
8
+ - oscar
9
+ ---
10
+ ### Overview
11
+
12
+ This is a slightly smaller model trained on [OSCAR](https://oscar-corpus.com/) Sinhala dedup dataset. As Sinhala is one of those low resource languages, there are only a handful of models been trained. So, this would be a great place to start training for more downstream tasks.
13
+
14
+ ## Model Specification
15
+
16
+
17
+ The model chosen for training is [Roberta](https://arxiv.org/abs/1907.11692) with the following specifications:
18
+ 1. vocab_size=52000
19
+ 2. max_position_embeddings=514
20
+ 3. num_attention_heads=12
21
+ 4. num_hidden_layers=6
22
+ 5. type_vocab_size=1
23
+
24
+ ## How to Use
25
+ You can use this model directly with a pipeline for masked language modeling:
26
+
27
+ ```py
28
+ from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
29
+
30
+ model = BertForMaskedLM.from_pretrained("keshan/SinhalaBERTo")
31
+ tokenizer = BertTokenizer.from_pretrained("keshan/SinhalaBERTo")
32
+
33
+ fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
34
+
35
+ fill_mask("මම ගෙදර <mask>.")
36
+
37
+ ```