Edit model card

cnn_dailymail_6789_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_6789_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 54
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - people - one - police - year 10 -1_said_people_one_police
0 player - league - cup - club - game 1072 0_player_league_cup_club
1 police - said - death - murder - found 291 1_police_said_death_murder
2 obama - president - republicans - house - republican 152 2_obama_president_republicans_house
3 labour - mr - cameron - minister - prime 98 3_labour_mr_cameron_minister
4 hospital - baby - surgery - heart - doctor 77 4_hospital_baby_surgery_heart
5 iphone - apple - user - device - phone 74 5_iphone_apple_user_device
6 doll - fashion - look - collection - like 69 6_doll_fashion_look_collection
7 syria - isis - syrian - iraq - iraqi 46 7_syria_isis_syrian_iraq
8 pakistan - taliban - al - drone - afghanistan 45 8_pakistan_taliban_al_drone
9 food - restaurant - menu - burger - coffee 43 9_food_restaurant_menu_burger
10 car - driver - vehicle - crash - driving 41 10_car_driver_vehicle_crash
11 space - tower - car - airport - nasa 40 11_space_tower_car_airport
12 property - house - home - apartment - room 40 12_property_house_home_apartment
13 school - rape - sexual - student - sex 36 13_school_rape_sexual_student
14 nfl - rice - quarterback - said - coach 36 14_nfl_rice_quarterback_said
15 music - album - song - miley - cnn 33 15_music_album_song_miley
16 olympic - gold - olympics - athlete - world 33 16_olympic_gold_olympics_athlete
17 zoo - bear - tian - elephant - ivory 33 17_zoo_bear_tian_elephant
18 flight - plane - aircraft - pilot - airport 32 18_flight_plane_aircraft_pilot
19 flu - bacteria - vaccine - health - disease 31 19_flu_bacteria_vaccine_health
20 dog - animal - pet - cat - dogs 30 20_dog_animal_pet_cat
21 school - education - exam - child - degree 30 21_school_education_exam_child
22 kenya - kenyan - mall - said - nairobi 28 22_kenya_kenyan_mall_said
23 cent - per - price - cadbury - christmas 27 23_cent_per_price_cadbury
24 french - france - sarkozy - hollande - minister 26 24_french_france_sarkozy_hollande
25 russian - ukraine - russia - putin - ukrainian 25 25_russian_ukraine_russia_putin
26 iran - nuclear - iranian - israel - irans 24 26_iran_nuclear_iranian_israel
27 film - bond - novel - the - cnn 24 27_film_bond_novel_the
28 lava - fire - snow - pahoa - volcano 24 28_lava_fire_snow_pahoa
29 drug - mexican - chavez - cartel - said 23 29_drug_mexican_chavez_cartel
30 ship - vessel - captain - crew - coast 23 30_ship_vessel_captain_crew
31 snowden - us - intelligence - information - gebregeorgis 23 31_snowden_us_intelligence_information
32 match - wimbledon - federer - final - open 22 32_match_wimbledon_federer_final
33 chinese - china - beijing - hong - protester 21 33_chinese_china_beijing_hong
34 jury - white - ferguson - police - said 21 34_jury_white_ferguson_police
35 weather - temperature - rain - warm - park 21 35_weather_temperature_rain_warm
36 prince - royal - william - princess - queen 20 36_prince_royal_william_princess
37 weight - fat - diet - gym - size 19 37_weight_fat_diet_gym
38 golf - mcilroy - round - pga - championship 19 38_golf_mcilroy_round_pga
39 hamilton - race - rosberg - prix - button 19 39_hamilton_race_rosberg_prix
40 north - kim - korean - korea - koreas 18 40_north_kim_korean_korea
41 human - found - fossil - ancient - fish 18 41_human_found_fossil_ancient
42 climate - change - global - energy - wind 17 42_climate_change_global_energy
43 school - teacher - pupil - schools - ofsted 17 43_school_teacher_pupil_schools
44 ebola - virus - health - outbreak - liberia 17 44_ebola_virus_health_outbreak
45 whale - nyad - shark - swim - beach 17 45_whale_nyad_shark_swim
46 money - kallakis - foster - court - wines 15 46_money_kallakis_foster_court
47 painting - art - portrait - auction - artist 14 47_painting_art_portrait_auction
48 solar - planet - sun - bubble - earth 14 48_solar_planet_sun_bubble
49 tsarnaev - oswald - boston - marathon - kennedy 14 49_tsarnaev_oswald_boston_marathon
50 patient - care - va - hospital - patients 14 50_patient_care_va_hospital
51 love - woman - im - relationship - men 13 51_love_woman_im_relationship
52 marijuana - alcohol - drug - hangover - liver 11 52_marijuana_alcohol_drug_hangover

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
5
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.