Tahmid commited on
Commit
31a5652
1 Parent(s): a91f0f3

Added note on normalizer.

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -9,7 +9,9 @@ licenses:
9
 
10
  This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
11
 
12
- For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official [repository](https://github.com/csebuetnlp/banglabert).
 
 
13
 
14
  ## Using this model as a discriminator in `transformers` (tested on 4.11.0.dev0)
15
 
 
9
 
10
  This repository contains the pretrained discriminator checkpoint of the model **BanglaBERT**. This is an [ELECTRA](https://openreview.net/pdf?id=r1xMH1BtvB) discriminator model pretrained with the Replaced Token Detection (RTD) objective. Finetuned models using this checkpoint achieve state-of-the-art results on many of the NLP tasks in bengali.
11
 
12
+ For finetuning on different downstream tasks such as `Sentiment classification`, `Named Entity Recognition`, `Natural Language Inference` etc., refer to the scripts in the official GitHub [repository](https://github.com/csebuetnlp/banglabert).
13
+
14
+ **Note**: This model was pretrained using a specific normalization pipeline available [here](https://github.com/csebuetnlp/normalizer). All finetuning scripts in the official GitHub repository uses this normalization by default. If you need to adapt the pretrained model for a different task make sure the text units are normalized using this pipeline before tokenizing to get best results. A basic example is given below:
15
 
16
  ## Using this model as a discriminator in `transformers` (tested on 4.11.0.dev0)
17