monsoon-nlp commited on
Commit
750d816
1 Parent(s): 6b1473a

relevant model links

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -7,6 +7,10 @@ language: bn
7
  This is a second attempt at a Bangla/Bengali language model trained with
8
  Google Research's [ELECTRA](https://github.com/google-research/electra).
9
 
 
 
 
 
10
  Tokenization and pre-training CoLab: https://colab.research.google.com/drive/1gpwHvXAnNQaqcu-YNx1kafEVxz07g2jL
11
 
12
  V1 - 120,000 steps; V2 - 190,000 steps
7
  This is a second attempt at a Bangla/Bengali language model trained with
8
  Google Research's [ELECTRA](https://github.com/google-research/electra).
9
 
10
+ **As of 2022 I recommend Google's MuRIL model trained on English, Bangla, and other major Indian languages, both in their script and latinized script**: https://huggingface.co/google/muril-base-cased and https://huggingface.co/google/muril-large-cased
11
+
12
+ **For causal language models, I would suggest https://huggingface.co/sberbank-ai/mGPT, though this is a large model**
13
+
14
  Tokenization and pre-training CoLab: https://colab.research.google.com/drive/1gpwHvXAnNQaqcu-YNx1kafEVxz07g2jL
15
 
16
  V1 - 120,000 steps; V2 - 190,000 steps