--- license: mit --- Bangla FastText Model (16 million tokens) I am Uzzal Mondal (LinkedIn), I have also built a FastText model specifically for Bangla NLP tasks. This repository contains a FastText model trained on Bangla Wikipedia data. This model can be used for various NLP tasks, including word similarity, word embeddings, and semantic analysis in Bangla text. Check the accompanying Python script for practical examples.
  • **Tokens Processed**: The model has read and processed 16 million tokens from the training corpus.
  • **Vocabulary Size**: 120,332 unique words
  • **Training Loss**: Average loss during training was 0.552678
  • **Embedding Dimension**: 100
  • **Training Configuration**: Epochs = 10
## How to Use the Model ### Load the Model in Your Code Use the following code to download and use the model: ```python from huggingface_hub import hf_hub_download import fasttext # Download the model model_path = hf_hub_download(repo_id="uzzalmondal/fasttext_wiki_bn_100d_16m", filename="fasttext_bn_wiki_100.bin") # Load the model model = fasttext.load_model(model_path) # Perform a word analogy task: "King - Man + Woman = Queen" analogy_result = model.get_nearest_neighbors('রাজা', k=10) print("Analogy result (King - Man + Woman):") for word, similarity in analogy_result: print(f"{word}: {similarity}") ``` ## Output ```python Analogy result (King - Man + Woman): 0.8683507442474365: রাজার 0.8025670051574707: রাজায় 0.7848780751228333: রাজপুত্র 0.7837258577346802: রাজারাও 0.7768903374671936: সামন্তরাজা 0.7766559720039368: রাজসিংহাসনে 0.7681295275688171: রাজত্বের 0.7603954672813416: রাজপুত্রদের 0.7589437365531921: রাজাদের 0.7575206756591797: রাজপুত্রের ``` 🔧 How Can You Contribute?
  • Suggest improvements or new features
  • Report any issues or bugs
  • Contribute to the codebase or documentation
  • Share your use cases or experiments
💬 Your feedback helps us:
  • Make Bangla NLP tools more accessible
  • Improve model performance
  • Extend the model’s capabilities to more applications