Edit model card

cnn_dailymail_22457_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_22457_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 49
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - people - police 10 -1_said_one_year_people
0 league - player - club - game - cup 1050 0_league_player_club_game
1 said - syria - government - iraq - islamic 317 1_said_syria_government_iraq
2 obama - president - house - state - republican 140 2_obama_president_house_state
3 cancer - hospital - baby - treatment - child 122 3_cancer_hospital_baby_treatment
4 google - apple - tablet - car - device 84 4_google_apple_tablet_car
5 fashion - dress - hair - look - woman 78 5_fashion_dress_hair_look
6 police - officer - shooting - said - shot 66 6_police_officer_shooting_said
7 film - movie - show - actor - comedy 65 7_film_movie_show_actor
8 murder - death - said - home - police 55 8_murder_death_said_home
9 mr - labour - minister - mp - blair 52 9_mr_labour_minister_mp
10 storm - water - weather - ice - rain 51 10_storm_water_weather_ice
11 shark - bear - turtle - crocodile - bird 50 11_shark_bear_turtle_crocodile
12 flight - plane - passenger - airport - pilot 49 12_flight_plane_passenger_airport
13 house - property - home - per - room 49 13_house_property_home_per
14 drug - police - court - stealing - robbery 40 14_drug_police_court_stealing
15 police - murder - mr - court - clavell 36 15_police_murder_mr_court
16 games - gold - olympic - race - sport 34 16_games_gold_olympic_race
17 student - school - teacher - said - cardosa 34 17_student_school_teacher_said
18 country - minister - energy - cent - greece 32 18_country_minister_energy_cent
19 golf - mcilroy - course - round - ryder 31 19_golf_mcilroy_course_round
20 police - harris - abuse - allegation - officer 30 20_police_harris_abuse_allegation
21 ebola - virus - africa - health - liberia 29 21_ebola_virus_africa_health
22 chinese - china - cable - bo - beijing 28 22_chinese_china_cable_bo
23 federer - tennis - murray - wimbledon - match 28 23_federer_tennis_murray_wimbledon
24 dog - animal - dogs - owner - simmons 26 24_dog_animal_dogs_owner
25 cent - per - woman - men - pickens 23 25_cent_per_woman_men
26 ship - boat - rescue - water - sea 23 26_ship_boat_rescue_water
27 hamilton - race - rosberg - mercedes - formula 22 27_hamilton_race_rosberg_mercedes
28 galaxy - planet - universe - earth - telescope 22 28_galaxy_planet_universe_earth
29 russian - russia - putin - ukraine - moscow 22 29_russian_russia_putin_ukraine
30 pakistan - pakistani - karachi - taliban - anwar 22 30_pakistan_pakistani_karachi_taliban
31 korea - north - korean - south - kim 21 31_korea_north_korean_south
32 car - driver - train - accident - cope 21 32_car_driver_train_accident
33 food - fruit - taste - cake - cream 20 33_food_fruit_taste_cake
34 painting - art - auction - artist - gallery 20 34_painting_art_auction_artist
35 base - drone - soldier - afghan - us 19 35_base_drone_soldier_afghan
36 weight - fat - eating - healthy - size 18 36_weight_fat_eating_healthy
37 mafia - wine - money - fraud - court 18 37_mafia_wine_money_fraud
38 aguilar - bravo - brewer - rambold - court 18 38_aguilar_bravo_brewer_rambold
39 missing - search - found - family - disappeared 17 39_missing_search_found_family
40 juarez - quezada - mexico - mexican - cartel 15 40_juarez_quezada_mexico_mexican
41 knicks - lin - chicago - blackhawks - game 15 41_knicks_lin_chicago_blackhawks
42 duchess - prince - kate - royal - william 15 42_duchess_prince_kate_royal
43 price - supermarket - asda - shop - food 14 43_price_supermarket_asda_shop
44 school - child - pupil - teacher - xxx 14 44_school_child_pupil_teacher
45 nhs - patient - ae - hospital - staff 13 45_nhs_patient_ae_hospital
46 zsa - francesca - rhodes - vongtau - gabor 12 46_zsa_francesca_rhodes_vongtau
47 medal - war - bomb - graf - vc 10 47_medal_war_bomb_graf

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
2