Edit model card

cnn_dailymail_108_50000_25000_validation

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_50000_25000_validation")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 92
  • Number of training documents: 13368
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - police - one - year - also 5 -1_said_police_one_year
0 league - game - player - goal - season 4918 0_league_game_player_goal
1 isis - syria - islamic - group - iraq 2700 1_isis_syria_islamic_group
2 dog - animal - elephant - bear - cat 415 2_dog_animal_elephant_bear
3 labour - mr - party - election - cameron 386 3_labour_mr_party_election
4 flight - plane - aircraft - pilot - crash 340 4_flight_plane_aircraft_pilot
5 hair - fashion - dress - look - model 248 5_hair_fashion_dress_look
6 car - driver - driving - road - police 227 6_car_driver_driving_road
7 food - cent - sugar - health - per 221 7_food_cent_sugar_health
8 police - officer - shooting - shot - said 215 8_police_officer_shooting_shot
9 clinton - email - obama - president - state 213 9_clinton_email_obama_president
10 cricket - england - cup - world - zealand 191 10_cricket_england_cup_world
11 property - house - home - room - price 184 11_property_house_home_room
12 fight - pacquiao - mayweather - manny - floyd 171 12_fight_pacquiao_mayweather_manny
13 hamilton - mercedes - race - prix - rosberg 135 13_hamilton_mercedes_race_prix
14 baby - hospital - birth - mother - child 127 14_baby_hospital_birth_mother
15 murray - wells - tennis - andy - match 127 15_murray_wells_tennis_andy
16 eclipse - earth - solar - sun - planet 102 16_eclipse_earth_solar_sun
17 police - abuse - sex - sexual - child 98 17_police_abuse_sex_sexual
18 apple - watch - device - user - google 96 18_apple_watch_device_user
19 netanyahu - iran - nuclear - israel - israeli 83 19_netanyahu_iran_nuclear_israel
20 putin - russian - nemtsov - moscow - russia 82 20_putin_russian_nemtsov_moscow
21 weight - fat - diet - size - stone 81 21_weight_fat_diet_size
22 race - armstrong - doping - world - tour 78 22_race_armstrong_doping_world
23 court - fraud - money - bank - mr 76 23_court_fraud_money_bank
24 cheltenham - hurdle - horse - race - jockey 74 24_cheltenham_hurdle_horse_race
25 mcilroy - round - masters - woods - golf 72 25_mcilroy_round_masters_woods
26 prince - charles - royal - duchess - camilla 72 26_prince_charles_royal_duchess
27 fraternity - university - sae - chapter - oklahoma 68 27_fraternity_university_sae_chapter
28 chan - sukumaran - bali - indonesian - mack 65 28_chan_sukumaran_bali_indonesian
29 ebola - sierra - virus - leone - disease 64 29_ebola_sierra_virus_leone
30 school - teacher - student - girl - sexual 58 30_school_teacher_student_girl
31 fire - building - explosion - blaze - firefighter 52 31_fire_building_explosion_blaze
32 nfl - borland - football - 49ers - season 52 32_nfl_borland_football_49ers
33 clarkson - bbc - gear - top - jeremy 50 33_clarkson_bbc_gear_top
34 ski - skier - mountain - avalanche - rock 47 34_ski_skier_mountain_avalanche
35 patient - nhs - ae - cancer - hospital 46 35_patient_nhs_ae_cancer
36 india - rape - documentary - indian - singh 45 36_india_rape_documentary_indian
37 mr - death - court - emery - miss 43 37_mr_death_court_emery
38 show - corden - host - stewart - williams 42 38_show_corden_host_stewart
39 car - vehicle - electric - cars - tesla 40 39_car_vehicle_electric_cars
40 school - child - education - porn - sex 38 40_school_child_education_porn
41 boko - haram - nigeria - nigerian - nigerias 37 41_boko_haram_nigeria_nigerian
42 marijuana - drug - cannabis - colorado - lsd 34 42_marijuana_drug_cannabis_colorado
43 law - indiana - gay - marriage - religious 33 43_law_indiana_gay_marriage
44 ferguson - department - police - justice - report 32 44_ferguson_department_police_justice
45 image - photographer - photography - photograph - photo 31 45_image_photographer_photography_photograph
46 snow - inch - winter - ice - storm 30 46_snow_inch_winter_ice
47 basketball - ncaa - coach - tournament - game 30 47_basketball_ncaa_coach_tournament
48 tsarnaev - boston - dzhokhar - tamerlan - tsarnaevs 30 48_tsarnaev_boston_dzhokhar_tamerlan
49 durst - dursts - berman - orleans - robert 29 49_durst_dursts_berman_orleans
50 jesus - ancient - stone - cave - circle 29 50_jesus_ancient_stone_cave
51 zayn - band - direction - singer - dance 29 51_zayn_band_direction_singer
52 film - movie - vivian - hollywood - script 23 52_film_movie_vivian_hollywood
53 korean - korea - kim - north - lippert 23 53_korean_korea_kim_north
54 weather - rain - temperature - snow - today 23 54_weather_rain_temperature_snow
55 robbery - woodger - store - cash - police 22 55_robbery_woodger_store_cash
56 parade - patricks - st - irish - green 21 56_parade_patricks_st_irish
57 secret - clancy - service - agent - white 20 57_secret_clancy_service_agent
58 hernandez - lloyd - jenkins - hernandezs - lloyds 20 58_hernandez_lloyd_jenkins_hernandezs
59 nazi - anne - nazis - war - camp 20 59_nazi_anne_nazis_war
60 snowden - intelligence - gchq - security - agency 18 60_snowden_intelligence_gchq_security
61 huang - chinese - china - mingxi - chen 17 61_huang_chinese_china_mingxi
62 wedding - married - marlee - platt - woodyard 17 62_wedding_married_marlee_platt
63 drug - cocaine - jailed - cannabis - tobacco 17 63_drug_cocaine_jailed_cannabis
64 cnn - transcript - student - news - roll 17 64_cnn_transcript_student_news
65 pope - francis - vatican - naples - pontiff 17 65_pope_francis_vatican_naples
66 richard - iii - leicester - king - iiis 17 66_richard_iii_leicester_king
67 chinese - tourist - temple - thailand - buddhist 16 67_chinese_tourist_temple_thailand
68 china - chinese - internet - chai - stopera 16 68_china_chinese_internet_chai
69 execution - lethal - gissendaner - injection - drug 16 69_execution_lethal_gissendaner_injection
70 woman - marriage - men - attractive - chalmers 15 70_woman_marriage_men_attractive
71 vanuatu - cyclone - vila - port - pam 15 71_vanuatu_cyclone_vila_port
72 poldark - turner - demelza - aidan - drama 15 72_poldark_turner_demelza_aidan
73 point - rebound - scored - points - harden 14 73_point_rebound_scored_points
74 rail - calais - parking - migrant - dickens 13 74_rail_calais_parking_migrant
75 johnson - student - virginia - charlottesville - uva 13 75_johnson_student_virginia_charlottesville
76 cuba - havana - cuban - rousseff - us 13 76_cuba_havana_cuban_rousseff
77 paris - attack - synagogue - hebdo - charlie 13 77_paris_attack_synagogue_hebdo
78 duckenfield - mr - gate - hillsborough - disaster 12 78_duckenfield_mr_gate_hillsborough
79 gordon - bobbi - kristina - phil - dr 12 79_gordon_bobbi_kristina_phil
80 knox - sollecito - kercher - raffaele - amanda 12 80_knox_sollecito_kercher_raffaele
81 coin - medal - war - auction - cross 12 81_coin_medal_war_auction
82 starbucks - schultz - race - racial - campaign 12 82_starbucks_schultz_race_racial
83 cosby - cosbys - thompson - bill - welles 11 83_cosby_cosbys_thompson_bill
84 jeffs - flds - rivette - compound - speer 10 84_jeffs_flds_rivette_compound
85 selma - alabama - march - bridge - civil 8 85_selma_alabama_march_bridge
86 jobs - naomi - fortune - redballoon - bn 8 86_jobs_naomi_fortune_redballoon
87 brain - object - retina - neuron - word 8 87_brain_object_retina_neuron
88 netflix - tv - content - streaming - screen 8 88_netflix_tv_content_streaming
89 social - user - tweet - twitter - tool 7 89_social_user_tweet_twitter
90 cunard - bird - darshan - ship - liner 6 90_cunard_bird_darshan_ship

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
2