inesbattah's picture
Add BERTopic model
1b98f4f verified
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

transformers_amazon_reviews_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("inesbattah/transformers_amazon_reviews_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 9000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 amazon - quality - product - cheap - seller 10 -1_amazon_quality_product_cheap
0 refund - ordered - order - delivered - return 3105 0_refund_ordered_order_delivered
1 charging - charger - charge - iphone - headphones 1556 1_charging_charger_charge_iphone
2 wear - shoe - shoes - zipper - fit 655 2_wear_shoe_shoes_zipper
3 shampoo - conditioner - scent - flavor - hair 635 3_shampoo_conditioner_scent_flavor
4 protector - protectors - screen - case - cases 452 4_protector_protectors_screen_case
5 color - colors - colored - blue - black 293 5_color_colors_colored_blue
6 bottle - leak - leaking - bottles - leaks 234 6_bottle_leak_leaking_bottles
7 lights - light - bulbs - flashlight - led 209 7_lights_light_bulbs_flashlight
8 dog - toy - dogs - puppy - chewed 205 8_dog_toy_dogs_puppy
9 chairs - chair - assemble - screws - assembling 192 9_chairs_chair_assemble_screws
10 cheap - cheaply - material - quality - cost 181 10_cheap_cheaply_material_quality
11 book - books - chapters - chapter - author 180 11_book_books_chapters_chapter
12 hose - faucet - pump - valve - leak 167 12_hose_faucet_pump_valve
13 pan - pans - pancakes - griddle - cook 127 13_pan_pans_pancakes_griddle
14 dvd - dvds - disc - discs - cd 114 14_dvd_dvds_disc_discs
15 fit - fitting - didnt - galaxy - samsung 109 15_fit_fitting_didnt_galaxy
16 razor - shave - razors - reviews - blades 97 16_razor_shave_razors_reviews
17 cartridges - cartridge - ink - printer - printing 97 17_cartridges_cartridge_ink_printer
18 watches - watch - clocks - clock - battery 88 18_watches_watch_clocks_clock
19 remote - remotes - buttons - button - programmed 78 19_remote_remotes_buttons_button
20 seeds - seed - planted - planting - germinated 43 20_seeds_seed_planted_planting
21 thermometer - temperature - temperatureoff - temps - temp 36 21_thermometer_temperature_temperatureoff_temps
22 instructions - directions - how - installation - cheap 34 22_instructions_directions_how_installation
23 pistol - holster - gun - glock19 - glock 29 23_pistol_holster_gun_glock19
24 tire - tires - tube - bike - wheel 20 24_tire_tires_tube_bike
25 snoring - snorkeling - snore - snorkel - snores 17 25_snoring_snorkeling_snore_snorkel
26 rugs - carpets - carpet - rug - floors 13 26_rugs_carpets_carpet_rug
27 waterproof - wet - swimming - bathing - raining 12 27_waterproof_wet_swimming_bathing
28 fan - squealing - noise - fans - quiet 12 28_fan_squealing_noise_fans

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.26.4
  • HDBSCAN: 0.8.39
  • UMAP: 0.5.7
  • Pandas: 2.2.2
  • Scikit-Learn: 1.5.2
  • Sentence-transformers: 3.2.1
  • Transformers: 4.44.2
  • Numba: 0.60.0
  • Plotly: 5.24.1
  • Python: 3.10.12