Tahmid commited on
Commit
9ce791f
1 Parent(s): 7bed1e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -10
README.md CHANGED
@@ -73,21 +73,23 @@ The benchmarking datasets are as follows:
73
  If you use this model, please cite the following paper:
74
  ```
75
  @inproceedings{bhattacharjee-etal-2022-banglabert,
76
- title = {BanglaBERT: Lagnuage Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla},
77
  author = "Bhattacharjee, Abhik and
78
  Hasan, Tahmid and
79
- Mubasshir, Kazi and
80
- Islam, Md. Saiful and
81
- Uddin, Wasi Ahmad and
82
  Iqbal, Anindya and
83
  Rahman, M. Sohel and
84
  Shahriyar, Rifat",
85
- booktitle = "Findings of the North American Chapter of the Association for Computational Linguistics: NAACL 2022",
86
- month = july,
87
- year = {2022},
88
- url = {https://arxiv.org/abs/2101.00204},
89
- eprinttype = {arXiv},
90
- eprint = {2101.00204}
 
 
91
  }
92
  ```
93
 
73
  If you use this model, please cite the following paper:
74
  ```
75
  @inproceedings{bhattacharjee-etal-2022-banglabert,
76
+ title = "{B}angla{BERT}: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in {B}angla",
77
  author = "Bhattacharjee, Abhik and
78
  Hasan, Tahmid and
79
+ Ahmad, Wasi and
80
+ Mubasshir, Kazi Samin and
81
+ Islam, Md Saiful and
82
  Iqbal, Anindya and
83
  Rahman, M. Sohel and
84
  Shahriyar, Rifat",
85
+ booktitle = "Findings of the Association for Computational Linguistics: NAACL 2022",
86
+ month = jul,
87
+ year = "2022",
88
+ address = "Seattle, United States",
89
+ publisher = "Association for Computational Linguistics",
90
+ url = "https://aclanthology.org/2022.findings-naacl.98",
91
+ pages = "1318--1327",
92
+ abstract = "In this work, we introduce BanglaBERT, a BERT-based Natural Language Understanding (NLU) model pretrained in Bangla, a widely spoken yet low-resource language in the NLP literature. To pretrain BanglaBERT, we collect 27.5 GB of Bangla pretraining data (dubbed {`}Bangla2B+{'}) by crawling 110 popular Bangla sites. We introduce two downstream task datasets on natural language inference and question answering and benchmark on four diverse NLU tasks covering text classification, sequence labeling, and span prediction. In the process, we bring them under the first-ever Bangla Language Understanding Benchmark (BLUB). BanglaBERT achieves state-of-the-art results outperforming multilingual and monolingual models. We are making the models, datasets, and a leaderboard publicly available at \url{https://github.com/csebuetnlp/banglabert} to advance Bangla NLP.",
93
  }
94
  ```
95