--- license: mit --- # Bangla FastText Model This is a FastText pre-trained model for the Bengali language. This model is build for [bnlp](https://github.com/sagorbrur/bnlp) package. ## Datasets - [Wikipedia dump datasets](https://dumps.wikimedia.org/bnwiki/latest/) ## Training Details - Fasttext trained with total words = 20M, vocab size = 1171011, epoch=50, embedding dimension = 300 ## Evaluation Details - training loss = 0.318668 ## Usage - `pip install -U bnlp_toolkit` - `pip install fasttext==0.9.2` - Generate Vector Using Pretrained Model ```py from bnlp.embedding.fasttext import BengaliFasttext bft = BengaliFasttext() word = "গ্রাম" model_path = "bengali_fasttext_wiki.bin" word_vector = bft.generate_word_vector(model_path, word) print(word_vector.shape) print(word_vector) ``` - Train Bengali FastText Model ```py from bnlp.embedding.fasttext import BengaliFasttext bft = BengaliFasttext() data = "raw_text.txt" model_name = "saved_model.bin" epoch = 50 bft.train(data, model_name, epoch) ``` - Generate Vector File from Fasttext Binary Model ```py from bnlp.embedding.fasttext import BengaliFasttext bft = BengaliFasttext() model_path = "mymodel.bin" out_vector_name = "myvector.txt" bft.bin2vec(model_path, out_vector_name) ```