Edit model card

cnn_dailymail_6789_200000_100000_v1_50topics_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_6789_200000_100000_v1_50topics_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 50
  • Number of training documents: 200000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - people - would 5 -1_said_one_year_people
0 league - player - game - team - cup 104194 0_league_player_game_team
1 said - police - told - court - family 27178 1_said_police_told_court
2 said - government - us - military - president 16379 2_said_government_us_military
3 car - said - flight - fire - plane 12476 3_car_said_flight_fire
4 per - cent - said - year - school 6776 4_per_cent_said_year
5 obama - president - said - state - republican 4128 5_obama_president_said_state
6 film - show - movie - cosby - the 3747 6_film_show_movie_cosby
7 said - mexico - mexican - government - border 2967 7_said_mexico_mexican_government
8 dog - animal - cat - zoo - pet 2258 8_dog_animal_cat_zoo
9 fashion - weight - art - painting - dress 2176 9_fashion_weight_art_painting
10 apple - user - iphone - google - facebook 2139 10_apple_user_iphone_google
11 food - energy - climate - per - gas 1861 11_food_energy_climate_per
12 ebola - virus - health - disease - outbreak 1846 12_ebola_virus_health_disease
13 war - soldier - british - mr - said 1693 13_war_soldier_british_mr
14 shark - whale - ship - oil - water 1686 14_shark_whale_ship_oil
15 cancer - drug - marijuana - smoking - study 1576 15_cancer_drug_marijuana_smoking
16 space - earth - planet - mars - nasa 1361 16_space_earth_planet_mars
17 prince - royal - queen - duchess - princess 1230 17_prince_royal_queen_duchess
18 ancient - found - site - archaeologist - discovered 769 18_ancient_found_site_archaeologist
19 pope - vatican - church - francis - cardinal 605 19_pope_vatican_church_francis
20 lottery - ticket - jackpot - million - winning 604 20_lottery_ticket_jackpot_million
21 game - robot - console - xbox - 3d 494 21_game_robot_console_xbox
22 park - hotel - island - beach - resort 428 22_park_hotel_island_beach
23 hollande - sarkozy - trierweiler - french - francois 354 23_hollande_sarkozy_trierweiler_french
24 teeth - eye - hand - ear - surgery 180 24_teeth_eye_hand_ear
25 kyle - routh - sniper - littlefield - gun 137 25_kyle_routh_sniper_littlefield
26 country - population - corruption - per - city 121 26_country_population_corruption_per
27 dubai - hajj - pilgrim - mecca - mme 88 27_dubai_hajj_pilgrim_mecca
28 ballet - filin - bolshoi - dancer - dmitrichenko 66 28_ballet_filin_bolshoi_dancer
29 oldest - age - guinness - worlds - dangi 50 29_oldest_age_guinness_worlds
30 fragrance - scent - perfume - smell - bottle 45 30_fragrance_scent_perfume_smell
31 dna - cell - graphene - genome - synthetic 44 31_dna_cell_graphene_genome
32 accent - favourite - fan - language - top 35 32_accent_favourite_fan_language
33 nobel - prize - peace - award - committee 33 33_nobel_prize_peace_award
34 violin - orchestra - stradivarius - instrument - symphony 31 34_violin_orchestra_stradivarius_instrument
35 turing - bletchley - enigma - code - machine 30 35_turing_bletchley_enigma_code
36 gandolfini - sopranos - gandolfinis - soprano - actor 26 36_gandolfini_sopranos_gandolfinis_soprano
37 nelson - napoleon - battle - trafalgar - hms 26 37_nelson_napoleon_battle_trafalgar
38 redskins - name - native - snyder - washington 25 38_redskins_name_native_snyder
39 eurovision - contest - song - conchita - country 25 39_eurovision_contest_song_conchita
40 evolution - creationism - scientific - intelligent - believe 21 40_evolution_creationism_scientific_intelligent
41 prabowo - indonesia - jakarta - widodo - jokowi 17 41_prabowo_indonesia_jakarta_widodo
42 dmlaterbundle - twittervia - lanza - zann - ilfracombe 15 42_dmlaterbundle_twittervia_lanza_zann
43 clock - time - hour - daylight - westworth 13 43_clock_time_hour_daylight
44 ikea - furniture - ikeas - kamprad - refugee 12 44_ikea_furniture_ikeas_kamprad
45 vick - vicks - nfl - dog - virginia 10 45_vick_vicks_nfl_dog
46 bulb - light - leds - paddle - bulbs 8 46_bulb_light_leds_paddle
47 port - cairo - ministry - egypt - fan 7 47_port_cairo_ministry_egypt
48 sanford - sanfords - jenny - carolina - mark 5 48_sanford_sanfords_jenny_carolina

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 50
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
2