bertopic-20-newsgroups
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("ctam8736/bertopic-20-newsgroups")
topic_model.get_topic_info()
Topic overview
- Number of topics: 135
- Number of training documents: 11314
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | article - information - subject - re - what | 10 | -1_article_information_subject_re |
0 | scsi - scsi2 - scsi1 - drives - bios | 3737 | 0_scsi_scsi2_scsi1_drives |
1 | nhl - puck - leafs - flyers - pitching | 976 | 1_nhl_puck_leafs_flyers |
2 | firearm - firearms - handgun - guns - gun | 918 | 2_firearm_firearms_handgun_guns |
3 | ford - honda - nissan - bmw - dealer | 409 | 3_ford_honda_nissan_bmw |
4 | encryption - encrypted - crypto - nsa - chip | 387 | 4_encryption_encrypted_crypto_nsa |
5 | atheism - atheist - atheists - christianity - belief | 377 | 5_atheism_atheist_atheists_christianity |
6 | hezbollah - gaza - lebanon - palestinians - lebanese | 342 | 6_hezbollah_gaza_lebanon_palestinians |
7 | window - x11r5 - openwindows - x11 - x11r4 | 249 | 7_window_x11r5_openwindows_x11 |
8 | modems - modem - mouse - ports - port | 243 | 8_modems_modem_mouse_ports |
9 | anonymity - anonymous - mailing - usenet - newsgroups | 151 | 9_anonymity_anonymous_mailing_usenet |
10 | armenians - armenia - armenian - azerbaijani - azerbaijan | 147 | 10_armenians_armenia_armenian_azerbaijani |
11 | clinton - stephanopoulos - secretary - president - congress | 135 | 11_clinton_stephanopoulos_secretary_president |
12 | os - windows - win32 - microsoft - win31 | 133 | 12_os_windows_win32_microsoft |
13 | diseases - disease - candida - infection - infections | 113 | 13_diseases_disease_candida_infection |
14 | superstition - msg - sensitivity - glutamate - causes | 100 | 14_superstition_msg_sensitivity_glutamate |
15 | laserjet - inkjet - printers - bubblejet - bubblejets | 86 | 15_laserjet_inkjet_printers_bubblejet |
16 | billboard - billboards - nasa - space - advertising | 75 | 16_billboard_billboards_nasa_space |
17 | radar - detectors - detector - detecting - radarjust | 68 | 17_radar_detectors_detector_detecting |
18 | speeding - speeds - mph - speed - driving | 64 | 18_speeding_speeds_mph_speed |
19 | ssto - moonbase - moon - lunar - billion | 63 | 19_ssto_moonbase_moon_lunar |
20 | station - nasa - redesign - space - shuttle | 61 | 20_station_nasa_redesign_space |
21 | eternity - afterlife - heaven - hell - judgement | 49 | 21_eternity_afterlife_heaven_hell |
22 | testament - manuscripts - scripture - bible - hebrew | 47 | 22_testament_manuscripts_scripture_bible |
23 | homosexuality - heterosexual - homosexual - homosexuals - gays | 45 | 23_homosexuality_heterosexual_homosexual_homosexuals |
24 | libertarians - libertarian - libertarianism - regulation - governments | 44 | 24_libertarians_libertarian_libertarianism_regulation |
25 | islamic - muslim - islam - muslims - koran | 44 | 25_islamic_muslim_islam_muslims |
26 | tax - taxes - vat - deficits - income | 44 | 26_tax_taxes_vat_deficits |
27 | oil - drain - engine - fuel - dumping | 44 | 27_oil_drain_engine_fuel |
28 | helmet - helmets - head - protection - gloves | 43 | 28_helmet_helmets_head_protection |
29 | fonts - font - ttfonts - truetype - printing | 42 | 29_fonts_font_ttfonts_truetype |
30 | morality - moral - morals - instinctive - immoral | 39 | 30_morality_moral_morals_instinctive |
31 | colormaps - colourmap - colormap - xalloccolor - cwcolormap | 39 | 31_colormaps_colourmap_colormap_xalloccolor |
32 | homosexuals - molesters - homosexual - homosexuality - pedophilia | 38 | 32_homosexuals_molesters_homosexual_homosexuality |
33 | migraine - migraines - headache - headaches - analgesics | 37 | 33_migraine_migraines_headache_headaches |
34 | resurrection - gospels - tomb - testament - jesuss | 37 | 34_resurrection_gospels_tomb_testament |
35 | graphics - copyright - images - siggraph - image | 37 | 35_graphics_copyright_images_siggraph |
36 | mormon - mormons - lds - brigham - utah | 35 | 36_mormon_mormons_lds_brigham |
37 | scientific - scipsychology - scientist - science - methodology | 34 | 37_scientific_scipsychology_scientist_science |
38 | tapes - tape - backup - copy - floppy | 34 | 38_tapes_tape_backup_copy |
39 | drugs - marijuana - drug - legalizing - legalization | 34 | 39_drugs_marijuana_drug_legalizing |
40 | punishment - punish - murder - penalty - murderer | 34 | 40_punishment_punish_murder_penalty |
41 | sphere - globe - radius - pointstruct - circle | 34 | 41_sphere_globe_radius_pointstruct |
42 | surgery - patients - hernia - massager - pain | 33 | 42_surgery_patients_hernia_massager |
43 | genocide - bosnia - atheism - serbs - christians | 32 | 43_genocide_bosnia_atheism_serbs |
44 | insurance - liability - insureyear - deductible - accident | 32 | 44_insurance_liability_insureyear_deductible |
45 | polygon - polygons - triangulation - hexagons - polyn | 30 | 45_polygon_polygons_triangulation_hexagons |
46 | spacecraft - galileo - galileos - mission - magellan | 29 | 46_spacecraft_galileo_galileos_mission |
47 | countersteering - countersteeringfaq - countersteer - riding - bikes | 29 | 47_countersteering_countersteeringfaq_countersteer_riding |
48 | antenna - antennas - transmitters - transmitting - radios | 28 | 48_antenna_antennas_transmitters_transmitting |
49 | canine - dogs - dog - spaniel - springer | 28 | 49_canine_dogs_dog_spaniel |
50 | batteries - battery - electrolyte - galvanized - zinc | 28 | 50_batteries_battery_electrolyte_galvanized |
51 | oscilloscope - scopes - scope - oscilliscopes - digital | 27 | 51_oscilloscope_scopes_scope_oscilliscopes |
52 | xgrabkey - definekeys - accelerators - accelerator - shiftkeyq | 27 | 52_xgrabkey_definekeys_accelerators_accelerator |
53 | protoncentaur - centaur - proton - accelerator - nuclear | 27 | 53_protoncentaur_centaur_proton_accelerator |
54 | telephone - dial - phone - call - lines | 26 | 54_telephone_dial_phone_call |
55 | marriages - wedding - vows - weddings - marriage | 25 | 55_marriages_wedding_vows_weddings |
56 | ibm - levels - level - nasa - software | 25 | 56_ibm_levels_level_nasa |
57 | nasa - aerospace - astronomy - spacecraft - astronomical | 24 | 57_nasa_aerospace_astronomy_spacecraft |
58 | motif - neosoft - unix - platforms - software | 24 | 58_motif_neosoft_unix_platforms |
59 | nuclear - cooling - reactor - tower - towers | 23 | 59_nuclear_cooling_reactor_tower |
60 | injuries - struck - snot - rocks - warningplease | 23 | 60_injuries_struck_snot_rocks |
61 | transmissions - shifter - automatics - autos - auto | 22 | 61_transmissions_shifter_automatics_autos |
62 | lzr1260 - printing - mwt9caxaxaxaxaxaxaxaxaxaxaxaxax - m9l0qaxaxaxaxaxaxaxaxaxaxaxaxaxax - mi68qaxaxaxaxaxaxaxaxaxaxaxaxaxax | 22 | 62_lzr1260_printing_mwt9caxaxaxaxaxaxaxaxaxaxaxaxax_m9l0qaxaxaxaxaxaxaxaxaxaxaxaxaxax |
63 | cview - files - directory - file - tmp | 21 | 63_cview_files_directory_file |
64 | immaculate - mary - marys - conception - catholics | 21 | 64_immaculate_mary_marys_conception |
65 | cryptology - cryptanalyst - crypt - cryptanalysis - ciphers | 20 | 65_cryptology_cryptanalyst_crypt_cryptanalysis |
66 | hotelco - hotels - resorts - hotel - tickets | 20 | 66_hotelco_hotels_resorts_hotel |
67 | 3dos - 3do - 3ds - 3d - 3dstudio | 20 | 67_3dos_3do_3ds_3d |
68 | comet - comets - jupiter - asteroids - jovian | 20 | 68_comet_comets_jupiter_asteroids |
69 | polishing - scratches - paint - rubbing - glaze | 20 | 69_polishing_scratches_paint_rubbing |
70 | newsgroup - groups - groupsplit - group - split | 20 | 70_newsgroup_groups_groupsplit_group |
71 | koresh - koreshs - david - sermon - biblical | 20 | 71_koresh_koreshs_david_sermon |
72 | parking - parked - liability - unsafe - stickers | 20 | 72_parking_parked_liability_unsafe |
73 | trumpet - tcp - windows - winqvtnet - winsock | 19 | 73_trumpet_tcp_windows_winqvtnet |
74 | freon - heater - coolant - r12 - vents | 19 | 74_freon_heater_coolant_r12 |
75 | sabbath - commandments - sunday - worship - church | 19 | 75_sabbath_commandments_sunday_worship |
76 | geekdom - computer - fourdcom - csws18icsunysbedu - psychnet | 19 | 76_geekdom_computer_fourdcom_csws18icsunysbedu |
77 | bosnia - serbs - sanctions - somalia - war | 18 | 77_bosnia_serbs_sanctions_somalia |
78 | soundblaster - midi - midimapper - soundexe - wavfiles | 18 | 78_soundblaster_midi_midimapper_soundexe |
79 | condo - remodeled - townhome - bedroom - rent | 18 | 79_condo_remodeled_townhome_bedroom |
80 | odometers - odometer - sensor - mileage - sensors | 18 | 80_odometers_odometer_sensor_mileage |
81 | joystick - joysticks - joyport - joyread - hardware | 17 | 81_joystick_joysticks_joyport_joyread |
82 | abortion - abortions - roe - proabortion - fetus | 17 | 82_abortion_abortions_roe_proabortion |
83 | seizures - seizure - allergies - corn - cereal | 17 | 83_seizures_seizure_allergies_corn |
84 | sobriety - sober - drinking - drink - drinks | 17 | 84_sobriety_sober_drinking_drink |
85 | nubus - lciiipowerpc - pds - powerpcs - powerpc | 17 | 85_nubus_lciiipowerpc_pds_powerpcs |
86 | mining - miners - minerals - miner - mineral | 17 | 86_mining_miners_minerals_miner |
87 | outlets - outlet - electrical - wiring - grounded | 16 | 87_outlets_outlet_electrical_wiring |
88 | rosicrucianum - rosicrucian - orders - order - organization | 16 | 88_rosicrucianum_rosicrucian_orders_order |
89 | tempest - shielding - surveillance - encryption - electromagnetic | 16 | 89_tempest_shielding_surveillance_encryption |
90 | monitor - monitors - screen - scrolling - display | 16 | 90_monitor_monitors_screen_scrolling |
91 | krillean - photographs - photography - kirlian - pictures | 16 | 91_krillean_photographs_photography_kirlian |
92 | scanner - scanners - scanning - scans - scanman | 16 | 92_scanner_scanners_scanning_scans |
93 | sexism - sexist - extramarital - islamic - marriage | 16 | 93_sexism_sexist_extramarital_islamic |
94 | noisy - noise - noises - rattled - quiets | 16 | 94_noisy_noise_noises_rattled |
95 | orion - astronomy - museum - prototype - space | 15 | 95_orion_astronomy_museum_prototype |
96 | easter - pagan - celebrating - feast - celebration | 15 | 96_easter_pagan_celebrating_feast |
97 | batf - assault - waco - blasting - blast | 15 | 97_batf_assault_waco_blasting |
98 | batchfile - ini - updating - file - winfileini | 15 | 98_batchfile_ini_updating_file |
99 | copyprotect - copying - protected - copy - protection | 15 | 99_copyprotect_copying_protected_copy |
100 | 42 - tiff - tiff6 - significance - universe | 14 | 100_42_tiff_tiff6_significance |
101 | stove - stoves - splitfires - splitfire - burns | 14 | 101_stove_stoves_splitfires_splitfire |
102 | automotive - backing - lights - corvette - reverse | 14 | 102_automotive_backing_lights_corvette |
103 | dock - docks - minidocks - portable - minidock | 14 | 103_dock_docks_minidocks_portable |
104 | cdaudio - stereo - audio - soundbase - speakers | 14 | 104_cdaudio_stereo_audio_soundbase |
105 | uv - flashlight - houselights - fluorescent - lamps | 14 | 105_uv_flashlight_houselights_fluorescent |
106 | papal - papacy - pope - popes - schism | 14 | 106_papal_papacy_pope_popes |
107 | scsi - quadra - quadras - quadraspecific - firmware | 14 | 107_scsi_quadra_quadras_quadraspecific |
108 | crohns - colitis - dietary - gastroenterology - diet | 13 | 108_crohns_colitis_dietary_gastroenterology |
109 | crashes - powerbook - plugged - corrupted - duos | 13 | 109_crashes_powerbook_plugged_corrupted |
110 | eyedness - handedness - righteye - righthandedness - eyes | 13 | 110_eyedness_handedness_righteye_righthandedness |
111 | wrench - pliers - tool - tools - srb | 13 | 111_wrench_pliers_tool_tools |
112 | scripture - scriptures - prophecy - revelation - revelations | 13 | 112_scripture_scriptures_prophecy_revelation |
113 | nikon - lens - lenses - olympus - 35mm | 13 | 113_nikon_lens_lenses_olympus |
114 | prosecution - suspects - encrypted - defendant - incriminate | 13 | 114_prosecution_suspects_encrypted_defendant |
115 | wheel - shaftdrives - wheelies - wheelie - shaftdrive | 12 | 115_wheel_shaftdrives_wheelies_wheelie |
116 | obesity - rebound - dieting - diet - metabolism | 12 | 116_obesity_rebound_dieting_diet |
117 | adl - adls - spying - fbi - investigation | 12 | 117_adl_adls_spying_fbi |
118 | lunar - moon - exploration - attend - conference | 12 | 118_lunar_moon_exploration_attend |
119 | draftees - draft - selective - military - abolished | 12 | 119_draftees_draft_selective_military |
120 | sunrise - sunset - daylight - algorithm - astronomical | 12 | 120_sunrise_sunset_daylight_algorithm |
121 | octopus - octopuses - octopi - squid - octapus | 12 | 121_octopus_octopuses_octopi_squid |
122 | gassing - explosion - gas - explode - explosive | 11 | 122_gassing_explosion_gas_explode |
123 | tutorial - handbook - chemistry - paperback - books | 11 | 123_tutorial_handbook_chemistry_paperback |
124 | amp - decibels - current - ampere - db | 11 | 124_amp_decibels_current_ampere |
125 | uniforms - jerseys - uniform - mets - reds | 11 | 125_uniforms_jerseys_uniform_mets |
126 | eugenics - eugenic - geneticallyengineered - genetic - genetically | 11 | 126_eugenics_eugenic_geneticallyengineered_genetic |
127 | fractals - fractal - fractally - compression - pascalfractals | 11 | 127_fractals_fractal_fractally_compression |
128 | sunview - xputimage - pixmap - pixmaps - ximage | 11 | 128_sunview_xputimage_pixmap_pixmaps |
129 | waving - wave - waves - bikers - bikes | 11 | 129_waving_wave_waves_bikers |
130 | vocoder - compressionalgorithms - compression - modems - cryptophones | 11 | 130_vocoder_compressionalgorithms_compression_modems |
131 | mouse - jumpiness - mousecom - mouseits - jumps | 11 | 131_mouse_jumpiness_mousecom_mouseits |
132 | netware - lan - workgroup - workgroups - w4wg | 10 | 132_netware_lan_workgroup_workgroups |
133 | timers - timer - ultralong - clock - oscillator | 10 | 133_timers_timer_ultralong_clock |
Training hyperparameters
- calculate_probabilities: False
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: auto
- seed_topic_list: None
- top_n_words: 10
- verbose: True
- zeroshot_min_similarity: 0.7
- zeroshot_topic_list: None
Framework versions
- Numpy: 1.23.5
- HDBSCAN: 0.8.33
- UMAP: 0.5.5
- Pandas: 2.2.1
- Scikit-Learn: 1.3.1
- Sentence-transformers: 2.5.1
- Transformers: 4.37.0.dev0
- Numba: 0.59.1
- Plotly: 5.20.0
- Python: 3.10.4
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.