Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,6 @@ tags:
|
|
7 |
- Bangla Base Bert
|
8 |
- Bangla Bert language model
|
9 |
- Bangla Bert
|
10 |
-
license: MIT
|
11 |
datasets:
|
12 |
- BanglaLM dataset
|
13 |
---
|
@@ -16,10 +15,11 @@ Here we published a pretrained Bangla bert language model as **bert-base-bangla*
|
|
16 |
Here we described [bert-base-bangla](https://github.com/Kowsher/bert-base-bangla) which is a pretrained Bangla language model based on mask language modeling described in [BERT](https://arxiv.org/abs/1810.04805) and the GitHub [repository](https://github.com/google-research/bert)
|
17 |
## Corpus Details
|
18 |
We trained the Bangla bert language model using BanglaLM dataset from kaggle [BanglaLM](https://www.kaggle.com/gakowsher/bangla-language-model-dataset). There is 3 version of dataset which is almost 40GB.
|
19 |
-
After downloading the dataset, we went on the way
|
20 |
-
|
21 |
|
22 |
**Bangla Base BERT Tokenizer**
|
|
|
23 |
```py
|
24 |
from transformers import AutoTokenizer, AutoModel
|
25 |
bnbert_tokenizer = AutoTokenizer.from_pretrained("Kowsher/bert-base-test")
|
|
|
7 |
- Bangla Base Bert
|
8 |
- Bangla Bert language model
|
9 |
- Bangla Bert
|
|
|
10 |
datasets:
|
11 |
- BanglaLM dataset
|
12 |
---
|
|
|
15 |
Here we described [bert-base-bangla](https://github.com/Kowsher/bert-base-bangla) which is a pretrained Bangla language model based on mask language modeling described in [BERT](https://arxiv.org/abs/1810.04805) and the GitHub [repository](https://github.com/google-research/bert)
|
16 |
## Corpus Details
|
17 |
We trained the Bangla bert language model using BanglaLM dataset from kaggle [BanglaLM](https://www.kaggle.com/gakowsher/bangla-language-model-dataset). There is 3 version of dataset which is almost 40GB.
|
18 |
+
After downloading the dataset, we went on the way to mask LM.
|
19 |
+
|
20 |
|
21 |
**Bangla Base BERT Tokenizer**
|
22 |
+
|
23 |
```py
|
24 |
from transformers import AutoTokenizer, AutoModel
|
25 |
bnbert_tokenizer = AutoTokenizer.from_pretrained("Kowsher/bert-base-test")
|