Edit model card

cnn_dailymail_123_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_123_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 57
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - police - people - year 10 -1_said_one_police_people
0 league - player - cup - goal - game 1070 0_league_player_cup_goal
1 police - said - home - murder - found 320 1_police_said_home_murder
2 court - mr - said - year - sex 142 2_court_mr_said_year
3 obama - president - republicans - house - republican 113 3_obama_president_republicans_house
4 plane - flight - passenger - airport - aircraft 89 4_plane_flight_passenger_airport
5 hospital - care - family - baby - mr 59 5_hospital_care_family_baby
6 fashion - dress - style - look - collection 57 6_fashion_dress_style_look
7 mr - minister - cameron - party - labour 50 7_mr_minister_cameron_party
8 weight - diet - food - fat - school 49 8_weight_diet_food_fat
9 mars - space - climate - nasa - mission 43 9_mars_space_climate_nasa
10 apple - ipad - iphone - app - apples 41 10_apple_ipad_iphone_app
11 shark - dolphin - fish - coast - water 39 11_shark_dolphin_fish_coast
12 teacher - school - student - said - state 37 12_teacher_school_student_said
13 murray - wimbledon - win - champion - match 36 13_murray_wimbledon_win_champion
14 race - prix - hamilton - gold - world 33 14_race_prix_hamilton_gold
15 dog - animal - owner - dogs - tiger 32 15_dog_animal_owner_dogs
16 syrian - syria - isis - islamic - force 32 16_syrian_syria_isis_islamic
17 storm - weather - lava - snow - said 32 17_storm_weather_lava_snow
18 chocolate - sale - cent - online - caramel 32 18_chocolate_sale_cent_online
19 afghanistan - afghan - pakistan - herat - taliban 32 19_afghanistan_afghan_pakistan_herat
20 music - band - halen - song - album 30 20_music_band_halen_song
21 beach - island - resort - park - hotel 29 21_beach_island_resort_park
22 mcilroy - golf - round - shot - hole 27 22_mcilroy_golf_round_shot
23 text - data - nsa - credit - email 26 23_text_data_nsa_credit
24 show - film - movie - actor - griffiths 26 24_show_film_movie_actor
25 putin - russian - russia - ukraine - moscow 26 25_putin_russian_russia_ukraine
26 art - artist - work - painting - pinata 25 26_art_artist_work_painting
27 economy - eurozone - european - euro - debt 24 27_economy_eurozone_european_euro
28 north - kim - korea - korean - jong 24 28_north_kim_korea_korean
29 ebola - virus - liberia - africa - outbreak 22 29_ebola_virus_liberia_africa
30 bike - speed - road - driver - cyclist 22 30_bike_speed_road_driver
31 car - accident - driver - scene - crash 20 31_car_accident_driver_scene
32 price - london - house - home - property 20 32_price_london_house_home
33 al - qaeda - yemen - us - yemeni 20 33_al_qaeda_yemen_us
34 mrs - police - murder - greaves - mr 20 34_mrs_police_murder_greaves
35 per - cent - people - age - average 19 35_per_cent_people_age
36 philpott - court - berry - husband - dewani 18 36_philpott_court_berry_husband
37 facebook - photo - user - instagram - cuddle 17 37_facebook_photo_user_instagram
38 vaccine - meningitis - disease - flu - princeton 17 38_vaccine_meningitis_disease_flu
39 bear - lion - gorilla - cub - zoo 16 39_bear_lion_gorilla_cub
40 brain - drug - alzheimers - memory - patient 16 40_brain_drug_alzheimers_memory
41 prince - royal - queen - duchess - duke 16 41_prince_royal_queen_duchess
42 boat - ship - river - vessel - ferry 15 42_boat_ship_river_vessel
43 china - chinese - chinas - organ - hong 14 43_china_chinese_chinas_organ
44 egypt - election - egyptian - mubarak - protest 13 44_egypt_election_egyptian_mubarak
45 mexico - mexican - cartel - mexicos - drug 13 45_mexico_mexican_cartel_mexicos
46 cia - assange - snowden - us - interrogation 13 46_cia_assange_snowden_us
47 police - hartman - hore - store - maitua 13 47_police_hartman_hore_store
48 israeli - israel - palestinian - gaza - hamas 12 48_israeli_israel_palestinian_gaza
49 pension - tax - scheme - energy - cent 12 49_pension_tax_scheme_energy
50 council - neighbour - village - site - shed 12 50_council_neighbour_village_site
51 occupy - protester - york - cosby - mayor 11 51_occupy_protester_york_cosby
52 mould - allergic - allergy - reaction - hand 11 52_mould_allergic_allergy_reaction
53 boko - haram - nigeria - sudan - isis 11 53_boko_haram_nigeria_sudan
54 disaster - building - tsunami - people - quake 11 54_disaster_building_tsunami_people
55 castro - sloot - der - ariel - aruba 11 55_castro_sloot_der_ariel

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.