bh4/bb335m · Hugging Face

Description

Biswabangla-335M-io is a 335 million parameters open source instruction-tuned Generative pretrained Language Model for Bangla/Bengali.

Biswabangla is a monolingual Bangla/Bengali Generative Language model. The tokenizer of Biswabangla also works for Assamese language.

This is a pretrained model from scratch at a context size of 4096. Furthermore instruction-tuned on 1 million Bengali input-output pairs across various Bengali NLP tasks.

This model is instruction-tuned on 1 million Bangla instructions in the form of (input,output) pairs.

This model is strictly prohibited to use for commercial purposes.

If you use our model, please cite our paper Niyogi and Bhattacharya, 2024

The architecture of Biswabangla is different than the language models, mentioned in Niyogi and Bhattacharya, 2024

Model Architecture

Transformer Decoder Only Auto Regressive Model

Limitations

The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

Gyan AI Research does own the output generated from the model.

Citations

@misc{niyogi2024paramanufamilynovelefficient,
      title={Paramanu: A Family of Novel Efficient Generative Foundation Language Models for Indian Languages}, 
      author={Mitodru Niyogi and Arnab Bhattacharya},
      year={2024},
      eprint={2401.18034},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2401.18034}, 
}

bh4
/

bb335m

Description

Model Architecture

Limitations

Citations

Model tree for bh4/bb335m