Edit model card

topic_docs5000

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Kamaljp/topic_docs5000")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 5000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 the - to - of - and - is 12 -1_the_to_of_and
0 the - in - to - he - game 1606 0_the_in_to_he
1 the - drive - to - with - for 450 1_the_drive_to_with
2 the - to - that - of - and 344 2_the_to_that_of
3 the - of - and - in - to 246 3_the_of_and_in
4 of - to - the - is - and 220 4_of_to_the_is
5 the - car - and - it - for 203 5_the_car_and_it
6 the - of - that - to - is 186 6_the_of_that_to
7 call - three - bittrolff - uhhhh - test 172 7_call_three_bittrolff_uhhhh
8 the - to - be - of - key 172 8_the_to_be_of
9 the - space - of - and - to 169 9_the_space_of_and
10 the - openwindows - to - window - and 169 10_the_openwindows_to_window
11 for - and - 100 - to - the 146 11_for_and_100_to
12 windows - dos - the - and - to 132 12_windows_dos_the_and
13 the - bike - to - my - was 105 13_the_bike_to_my
14 you - that - to - of - your 100 14_you_that_to_of
15 for - and - to - mail - send 100 15_for_and_to_mail
16 to - that - homosexual - of - is 94 16_to_that_homosexual_of
17 is - that - objective - of - science 66 17_is_that_objective_of
18 printer - fonts - deskjet - hp - the 56 18_printer_fonts_deskjet_hp
19 jpeg - image - gif - file - format 45 19_jpeg_image_gif_file
20 points - graeme - polygon - the - lines 44 20_points_graeme_polygon_the
21 radar - detector - detectors - is - the 28 21_radar_detector_detectors_is
22 hotel - dj - for - ticket - price 27 22_hotel_dj_for_ticket
23 insurance - health - private - the - and 26 23_insurance_health_private_the
24 water - battery - temperature - the - discharge 21 24_water_battery_temperature_the
25 oil - paint - it - wax - and 17 25_oil_paint_it_wax
26 drugs - cocaine - lsd - drug - license 16 26_drugs_cocaine_lsd_drug
27 motif - toolkit - cosecomplient - api - mean 15 27_motif_toolkit_cosecomplient_api
28 maxaxaxaxaxaxaxaxaxaxaxaxaxaxax - entry - entries - rules - we 13 28_maxaxaxaxaxaxaxaxaxaxaxaxaxaxax_entry_entries_rules

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.30.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
4
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.