Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,16 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
SinBerto is a small language model trained on a small news corpus. SinBerto is trained on Sinhala Language which is a low resource language compared to other languages.
|
4 |
|
5 |
-
Model Specifications.
|
6 |
|
7 |
vocab_size=52_000,
|
8 |
max_position_embeddings=514,
|
@@ -11,10 +19,16 @@ num_hidden_layers=6,
|
|
11 |
type_vocab_size=1
|
12 |
|
13 |
|
14 |
-
How to from the Transformers Library
|
15 |
|
16 |
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
17 |
|
18 |
tokenizer = AutoTokenizer.from_pretrained("Kalindu/SinBerto")
|
19 |
|
20 |
-
model = AutoModelForMaskedLM.from_pretrained("Kalindu/SinBerto")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: si
|
3 |
+
tags:
|
4 |
+
- SinhalaBERTo
|
5 |
+
- Sinhala
|
6 |
+
- roberta
|
7 |
+
---
|
8 |
+
|
9 |
+
### Overview
|
10 |
|
11 |
SinBerto is a small language model trained on a small news corpus. SinBerto is trained on Sinhala Language which is a low resource language compared to other languages.
|
12 |
|
13 |
+
### Model Specifications.
|
14 |
|
15 |
vocab_size=52_000,
|
16 |
max_position_embeddings=514,
|
|
|
19 |
type_vocab_size=1
|
20 |
|
21 |
|
22 |
+
### How to from the Transformers Library
|
23 |
|
24 |
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
25 |
|
26 |
tokenizer = AutoTokenizer.from_pretrained("Kalindu/SinBerto")
|
27 |
|
28 |
+
model = AutoModelForMaskedLM.from_pretrained("Kalindu/SinBerto")
|
29 |
+
|
30 |
+
|
31 |
+
### OR Clone the model repo
|
32 |
+
|
33 |
+
git lfs install
|
34 |
+
git clone https://huggingface.co/Kalindu/SinBerto
|