Bangla Tokenizer with Extending Llama2
Details
- Taken llama2 tokenizer with vocab size: 32000
- Add new Bangla tokens: 48667
- Now this tokenizer vocab size: 80665
- Bangla tokens added from https://huggingface.co/hishab/bn_sentencepiece_vs_50k_58GB
- Downloads last month
- 0
Unable to determine this model’s pipeline type. Check the
docs
.