Edit model card

bertopic_kmean-20topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("hts98/bertopic_kmean-20topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 20
  • Number of training documents: 529579
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
0 hanoi - quarter - old - bay - lake 70447 0_hanoi_quarter_old_bay
1 vietnam - vietnamese - best - stayed - mekong 61694 1_vietnam_vietnamese_best_stayed
2 location - hotel - good - old - breakfast 50809 2_location_hotel_good_old
3 good - clean - location - helpful - friendly 44027 3_good_clean_location_helpful
4 pool - beach - view - massage - spa 43959 4_pool_beach_view_massage
5 room - told - said - asked - shower 40332 5_room_told_said_asked
6 thank - service - staff - ms - helpful 36010 6_thank_service_staff_ms
7 hoi - homestay - town - bikes - free 28816 7_hoi_homestay_town_bikes
8 saigon - minh - chi - ho - city 28655 8_saigon_minh_chi_ho
9 resort - villa - beach - villas - island 20536 9_resort_villa_beach_villas
10 bikes - beach - town - bike - free 19495 10_bikes_beach_town_bike
11 hostel - dorm - dalat - dorms - beds 17662 11_hostel_dorm_dalat_dorms
12 bay - halong - ha - cruise - kiem 12629 12_bay_halong_ha_cruise
13 nang - da - danang - naman - dragon 12005 13_nang_da_danang_naman
14 phu - quoc - resort - mui - ne 9228 14_phu_quoc_resort_mui
15 hcmc - hcm - tau - vung - silverland 8368 15_hcmc_hcm_tau_vung
16 phong - ninh - binh - nha - coc 8121 16_phong_ninh_binh_nha
17 hue - citadel - imperial - jade - serene 8072 17_hue_citadel_imperial_jade
18 nha - trang - sheraton - beach - russian 6163 18_nha_trang_sheraton_beach
19 la - siesta - residencia - trendy - selva 2551 19_la_siesta_residencia_trendy

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 15
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.24.3
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.5
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.35.2
  • Numba: 0.57.1
  • Plotly: 5.16.1
  • Python: 3.10.12
Downloads last month
4