urdu_topic_modeling

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("shaistaDev7/urdu_topic_modeling")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 5
  • Number of training documents: 1008
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
0 کینسر - استعمال - جسم - علاج - افراد 315 0_کینسر_استعمال_جسم_علاج
1 ٹیم - کرکٹ - محمد - میڈل - انگلینڈ 240 1_ٹیم_کرکٹ_محمد_میڈل
2 روپے - ارب - فیصد - ٹیکس - حکومت 238 2_روپے_ارب_فیصد_ٹیکس
3 فلم - خان - ووڈ - بالی - اداکارہ 205 3_فلم_خان_ووڈ_بالی
4 ظفر - میشا - شفیع - علی - جنسی 10 4_ظفر_میشا_شفیع_علی

Training hyperparameters

  • calculate_probabilities: True
  • language: urdu
  • low_memory: True
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.5
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.35.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.