# text_generation_bangla_model BanglaCLM dataset: - OSCAR: 12.84GB - Wikipedia dump: 6.24GB - ProthomAlo: 3.92GB - Kalerkantho: 3.24GB ## Model description - context size : 128 ## Training and evaluation data The BanglaCLM data set is divided into a training set (90%)and a validation set (10%). ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - Batch size: 32 - Initial learning rate: 5e-5 - Number of warmup steps: 10000 - Weight decay rate: 0.01 - Tokenization algorithm: BPE - Vocabulary size of tokenizer: 50256 - Total trainable params: 124,439,808 - Epochs: 40 - Number of training steps: 40772228 - training_precision: float32 ### Training results perplexity score: 2.86. ### Framework versions - Transformers 4.26.1 - TensorFlow 2.11.0 - Datasets 2.10.0 - Tokenizers 0.13.2 ### Citation If you find this model helpful, please cite. ``` @INPROCEEDINGS{10303383, author={Salim, Md. Shahidul and Murad, Hasan and Das, Dola and Ahmed, Faisal}, booktitle={2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)}, title={BanglaGPT: A Generative Pretrained Transformer-Based Model for Bangla Language}, year={2023}, volume={}, number={}, pages={56-59}, doi={10.1109/ICICT4SD59951.2023.10303383}} ```