cnn_dailymail_6789_50000_25000_test
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_6789_50000_25000_test")
topic_model.get_topic_info()
Topic overview
- Number of topics: 108
- Number of training documents: 11490
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | said - one - year - also - police | 5 | -1_said_one_year_also |
0 | league - season - player - game - goal | 4981 | 0_league_season_player_game |
1 | isis - syria - islamic - group - militant | 2424 | 1_isis_syria_islamic_group |
2 | property - hotel - room - house - home | 194 | 2_property_hotel_room_house |
3 | fight - mayweather - pacquiao - floyd - manny | 188 | 3_fight_mayweather_pacquiao_floyd |
4 | labour - miliband - snp - mr - leader | 156 | 4_labour_miliband_snp_mr |
5 | driver - car - road - vehicle - driving | 133 | 5_driver_car_road_vehicle |
6 | baby - hospital - cancer - birth - mother | 133 | 6_baby_hospital_cancer_birth |
7 | school - student - teacher - pupil - class | 125 | 7_school_student_teacher_pupil |
8 | flight - plane - passenger - airport - pilot | 114 | 8_flight_plane_passenger_airport |
9 | masters - woods - augusta - spieth - mcilroy | 112 | 9_masters_woods_augusta_spieth |
10 | fashion - dress - model - style - designer | 107 | 10_fashion_dress_model_style |
11 | chocolate - food - egg - sugar - restaurant | 98 | 11_chocolate_food_egg_sugar |
12 | clinton - hillary - clintons - president - campaign | 89 | 12_clinton_hillary_clintons_president |
13 | police - murder - mr - miss - body | 89 | 13_police_murder_mr_miss |
14 | lion - animal - elephant - zoo - wildlife | 83 | 14_lion_animal_elephant_zoo |
15 | weight - food - eating - diet - size | 82 | 15_weight_food_eating_diet |
16 | djokovic - murray - open - miami - berdych | 75 | 16_djokovic_murray_open_miami |
17 | dog - cat - animal - owner - pet | 75 | 17_dog_cat_animal_owner |
18 | police - vault - gang - thief - raid | 74 | 18_police_vault_gang_thief |
19 | planet - solar - earth - surface - moon | 65 | 19_planet_solar_earth_surface |
20 | gray - police - baltimore - officer - grays | 64 | 20_gray_police_baltimore_officer |
21 | nepal - earthquake - kathmandu - everest - avalanche | 58 | 21_nepal_earthquake_kathmandu_everest |
22 | fire - blaze - bradford - firefighter - flame | 55 | 22_fire_blaze_bradford_firefighter |
23 | hamilton - rosberg - race - mercedes - prix | 52 | 23_hamilton_rosberg_race_mercedes |
24 | prince - royal - queen - duchess - princess | 52 | 24_prince_royal_queen_duchess |
25 | tax - labour - economy - mr - cameron | 51 | 25_tax_labour_economy_mr |
26 | shot - police - shooting - brady - gun | 48 | 26_shot_police_shooting_brady |
27 | anzac - gallipoli - war - australian - waterloo | 47 | 27_anzac_gallipoli_war_australian |
28 | chan - sukumaran - execution - bali - indonesian | 45 | 28_chan_sukumaran_execution_bali |
29 | migrant - boat - libya - mediterranean - italian | 44 | 29_migrant_boat_libya_mediterranean |
30 | china - chinese - chinas - kun - organ | 43 | 30_china_chinese_chinas_kun |
31 | iran - nuclear - deal - agreement - irans | 43 | 31_iran_nuclear_deal_agreement |
32 | neanderthals - cave - human - specie - bone | 43 | 32_neanderthals_cave_human_specie |
33 | shark - fish - whale - seal - water | 41 | 33_shark_fish_whale_seal |
34 | mccoy - jockey - race - ride - sandown | 41 | 34_mccoy_jockey_race_ride |
35 | yemen - saudi - houthi - houthis - rebel | 39 | 35_yemen_saudi_houthi_houthis |
36 | ship - vessel - crew - boat - titanic | 39 | 36_ship_vessel_crew_boat |
37 | nfl - manziel - game - quarterback - patriots | 39 | 37_nfl_manziel_game_quarterback |
38 | bruce - jenner - bobbi - bobby - kris | 38 | 38_bruce_jenner_bobbi_bobby |
39 | money - fraud - bank - account - court | 38 | 39_money_fraud_bank_account |
40 | wars - star - film - movie - trailer | 37 | 40_wars_star_film_movie |
41 | hernandez - lloyd - hernandezs - odin - murder | 32 | 41_hernandez_lloyd_hernandezs_odin |
42 | law - religious - marriage - indiana - samesex | 32 | 42_law_religious_marriage_indiana |
43 | child - langlais - death - murder - dellinger | 31 | 43_child_langlais_death_murder |
44 | tsarnaev - boston - dzhokhar - tamerlan - death | 31 | 44_tsarnaev_boston_dzhokhar_tamerlan |
45 | marathon - running - race - runner - run | 31 | 45_marathon_running_race_runner |
46 | clarkson - gear - bbc - top - hammond | 29 | 46_clarkson_gear_bbc_top |
47 | water - weather - temperature - drought - climate | 29 | 47_water_weather_temperature_drought |
48 | point - nba - scored - playoff - rebound | 29 | 48_point_nba_scored_playoff |
49 | marijuana - cannabis - drug - hemp - smoking | 28 | 49_marijuana_cannabis_drug_hemp |
50 | slager - scott - officer - charleston - walter | 28 | 50_slager_scott_officer_charleston |
51 | died - family - mother - inquest - child | 28 | 51_died_family_mother_inquest |
52 | groening - camp - auschwitz - nazi - jews | 27 | 52_groening_camp_auschwitz_nazi |
53 | alshabaab - garissa - kenya - kenyan - attack | 26 | 53_alshabaab_garissa_kenya_kenyan |
54 | artist - paint - painted - colouring - art | 26 | 54_artist_paint_painted_colouring |
55 | crucible - osullivan - frame - doherty - world | 24 | 55_crucible_osullivan_frame_doherty |
56 | janner - lord - saunders - public - abuse | 24 | 56_janner_lord_saunders_public |
57 | apple - watch - iphone - samsung - battery | 23 | 57_apple_watch_iphone_samsung |
58 | korea - korean - kim - north - seoul | 23 | 58_korea_korean_kim_north |
59 | tornado - storm - cloud - lightning - wind | 21 | 59_tornado_storm_cloud_lightning |
60 | housing - tenant - property - buy - association | 20 | 60_housing_tenant_property_buy |
61 | hughes - capitol - gyrocopter - secret - lawn | 20 | 61_hughes_capitol_gyrocopter_secret |
62 | vaccine - vaccination - cough - whooping - autism | 19 | 62_vaccine_vaccination_cough_whooping |
63 | putin - russian - russia - ukraine - moscow | 19 | 63_putin_russian_russia_ukraine |
64 | boko - haram - nigeria - nigerian - buhari | 19 | 64_boko_haram_nigeria_nigerian |
65 | south - johannesburg - africa - african - violence | 19 | 65_south_johannesburg_africa_african |
66 | bates - harris - tulsa - deputy - taser | 19 | 66_bates_harris_tulsa_deputy |
67 | aldi - tesco - cent - per - price | 19 | 67_aldi_tesco_cent_per |
68 | bolt - phelps - ennishill - olympic - kipsiro | 19 | 68_bolt_phelps_ennishill_olympic |
69 | cuba - castro - obama - cuban - president | 18 | 69_cuba_castro_obama_cuban |
70 | murray - dunblane - sears - wedding - andy | 18 | 70_murray_dunblane_sears_wedding |
71 | mchenry - weinstein - battilana - britt - towing | 18 | 71_mchenry_weinstein_battilana_britt |
72 | nhs - gp - gps - ae - patient | 18 | 72_nhs_gp_gps_ae |
73 | cancer - breast - prostate - gene - cell | 18 | 73_cancer_breast_prostate_gene |
74 | emoji - app - user - facebook - use | 17 | 74_emoji_app_user_facebook |
75 | melbourne - police - anzac - australian - australia | 17 | 75_melbourne_police_anzac_australian |
76 | song - songs - no - album - chart | 17 | 76_song_songs_no_album |
77 | sydney - storm - weather - flooding - hail | 16 | 77_sydney_storm_weather_flooding |
78 | car - audi - motor - bentley - vehicle | 15 | 78_car_audi_motor_bentley |
79 | rocket - space - spacex - launch - booster | 15 | 79_rocket_space_spacex_launch |
80 | underground - land - cave - garnet - built | 14 | 80_underground_land_cave_garnet |
81 | genocide - armenians - armenian - pope - ottoman | 14 | 81_genocide_armenians_armenian_pope |
82 | hair - jamelia - labium - rita - cheryl | 14 | 82_hair_jamelia_labium_rita |
83 | stephanie - scott - scotts - stanford - leeton | 13 | 83_stephanie_scott_scotts_stanford |
84 | funeral - nelms - work - job - grandparent | 13 | 84_funeral_nelms_work_job |
85 | alcohol - wine - drinking - oak - drink | 13 | 85_alcohol_wine_drinking_oak |
86 | nuclear - reactor - radiation - plant - fukushima | 12 | 86_nuclear_reactor_radiation_plant |
87 | luke - search - bushland - missing - eildon | 12 | 87_luke_search_bushland_missing |
88 | snowden - nsa - agency - oliver - information | 12 | 88_snowden_nsa_agency_oliver |
89 | brandt - dr - kimmy - franff - fredric | 10 | 89_brandt_dr_kimmy_franff |
90 | tidal - music - radio - streaming - service | 10 | 90_tidal_music_radio_streaming |
91 | population - immigrant - cent - per - immigration | 10 | 91_population_immigrant_cent_per |
92 | brain - acetaminophen - meditation - cortisol - study | 9 | 92_brain_acetaminophen_meditation_cortisol |
93 | god - church - dollar - catholic - schuller | 8 | 93_god_church_dollar_catholic |
94 | phone - user - google - device - app | 8 | 94_phone_user_google_device |
95 | cocaine - cutter - custom - seized - tsa | 8 | 95_cocaine_cutter_custom_seized |
96 | pusok - deputy - officer - pusoks - mcmahon | 7 | 96_pusok_deputy_officer_pusoks |
97 | stover - kost - rape - convicted - offender | 7 | 97_stover_kost_rape_convicted |
98 | nauru - sexual - sex - genetic - convicted | 7 | 98_nauru_sexual_sex_genetic |
99 | tsa - security - roberts - airport - employee | 7 | 99_tsa_security_roberts_airport |
100 | eaves - beach - martistee - mckeithen - spring | 7 | 100_eaves_beach_martistee_mckeithen |
101 | oclee - michelle - philippa - barrientos - mcwhirter | 6 | 101_oclee_michelle_philippa_barrientos |
102 | redman - wisconsin - basketball - badgers - wildcats | 6 | 102_redman_wisconsin_basketball_badgers |
103 | gransbury - biderman - funking - website - joke | 6 | 103_gransbury_biderman_funking_website |
104 | richards - ariana - beverly - kim - hills | 6 | 104_richards_ariana_beverly_kim |
105 | affleck - gates - renner - avengers - afflecks | 5 | 105_affleck_gates_renner_avengers |
106 | skin - sun - protoporphyrin - cream - sunlight | 5 | 106_skin_sun_protoporphyrin_cream |
Training hyperparameters
- calculate_probabilities: True
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: None
- seed_topic_list: None
- top_n_words: 10
- verbose: False
Framework versions
- Numpy: 1.23.5
- HDBSCAN: 0.8.33
- UMAP: 0.5.3
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.2.2
- Transformers: 4.31.0
- Numba: 0.57.1
- Plotly: 5.15.0
- Python: 3.10.12
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.