bertopic_kmean-20topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("hts98/bertopic_kmean-20topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 20
  • Number of training documents: 529579
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
0 hanoi - quarter - old - bay - lake 70447 0_hanoi_quarter_old_bay
1 vietnam - vietnamese - best - stayed - mekong 61694 1_vietnam_vietnamese_best_stayed
2 location - hotel - good - old - breakfast 50809 2_location_hotel_good_old
3 good - clean - location - helpful - friendly 44027 3_good_clean_location_helpful
4 pool - beach - view - massage - spa 43959 4_pool_beach_view_massage
5 room - told - said - asked - shower 40332 5_room_told_said_asked
6 thank - service - staff - ms - helpful 36010 6_thank_service_staff_ms
7 hoi - homestay - town - bikes - free 28816 7_hoi_homestay_town_bikes
8 saigon - minh - chi - ho - city 28655 8_saigon_minh_chi_ho
9 resort - villa - beach - villas - island 20536 9_resort_villa_beach_villas
10 bikes - beach - town - bike - free 19495 10_bikes_beach_town_bike
11 hostel - dorm - dalat - dorms - beds 17662 11_hostel_dorm_dalat_dorms
12 bay - halong - ha - cruise - kiem 12629 12_bay_halong_ha_cruise
13 nang - da - danang - naman - dragon 12005 13_nang_da_danang_naman
14 phu - quoc - resort - mui - ne 9228 14_phu_quoc_resort_mui
15 hcmc - hcm - tau - vung - silverland 8368 15_hcmc_hcm_tau_vung
16 phong - ninh - binh - nha - coc 8121 16_phong_ninh_binh_nha
17 hue - citadel - imperial - jade - serene 8072 17_hue_citadel_imperial_jade
18 nha - trang - sheraton - beach - russian 6163 18_nha_trang_sheraton_beach
19 la - siesta - residencia - trendy - selva 2551 19_la_siesta_residencia_trendy

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 15
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.24.3
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.5
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.35.2
  • Numba: 0.57.1
  • Plotly: 5.16.1
  • Python: 3.10.12
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.