Edit model card

xsum_108_50000_25000_test

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_108_50000_25000_test")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 84
  • Number of training documents: 11334
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - people - would - year 5 -1_said_mr_people_would
0 win - goal - game - league - foul 4854 0_win_goal_game_league
1 police - court - said - officer - mr 1637 1_police_court_said_officer
2 labour - party - eu - election - vote 872 2_labour_party_eu_election
3 health - care - nhs - patient - cancer 341 3_health_care_nhs_patient
4 olympic - sport - race - gold - medal 325 4_olympic_sport_race_gold
5 cricket - england - wicket - test - captain 278 5_cricket_england_wicket_test
6 animal - dog - bird - whale - specie 206 6_animal_dog_bird_whale
7 bridge - rail - council - said - transport 199 7_bridge_rail_council_said
8 school - education - student - teacher - university 193 8_school_education_student_teacher
9 bank - rate - growth - economy - market 181 9_bank_rate_growth_economy
10 syria - syrian - iraq - iran - force 145 10_syria_syrian_iraq_iran
11 energy - industry - wind - electricity - company 119 11_energy_industry_wind_electricity
12 film - actress - star - actor - character 80 12_film_actress_star_actor
13 president - boko - african - haram - mr 79 13_president_boko_african_haram
14 fire - blaze - service - smoke - said 79 14_fire_blaze_service_smoke
15 trump - mr - republican - trumps - president 75 15_trump_mr_republican_trumps
16 music - album - song - band - singer 70 16_music_album_song_band
17 race - hamilton - f1 - mercedes - lap 68 17_race_hamilton_f1_mercedes
18 space - earth - planet - solar - orbit 61 18_space_earth_planet_solar
19 lifeboat - rnli - beach - coastguard - rescue 55 19_lifeboat_rnli_beach_coastguard
20 flood - flooding - water - weather - rain 55 20_flood_flooding_water_weather
21 fight - boxing - champion - joshua - ali 54 21_fight_boxing_champion_joshua
22 plane - aircraft - flight - passenger - pilot 54 22_plane_aircraft_flight_passenger
23 earthquake - quake - flood - people - water 53 23_earthquake_quake_flood_people
24 russian - russia - ukraine - putin - ukrainian 49 24_russian_russia_ukraine_putin
25 murray - match - wimbledon - tennis - konta 47 25_murray_match_wimbledon_tennis
26 bitcoin - security - talktalk - data - tor 44 26_bitcoin_security_talktalk_data
27 round - birdie - bogey - par - shot 41 27_round_birdie_bogey_par
28 ireland - dup - sinn - northern - party 39 28_ireland_dup_sinn_northern
29 maduro - venezuela - president - venezuelan - opposition 36 29_maduro_venezuela_president_venezuelan
30 yn - ar - yr - ei - wedi 36 30_yn_ar_yr_ei
31 painting - art - gallery - portrait - museum 34 31_painting_art_gallery_portrait
32 unsupported - updated - bst - playback - media 33 32_unsupported_updated_bst_playback
33 migrant - eu - asylum - turkey - germany 31 33_migrant_eu_asylum_turkey
34 stone - cave - discovery - site - tree 30 34_stone_cave_discovery_site
35 parade - poppy - flag - jesus - statue 30 35_parade_poppy_flag_jesus
36 drug - cannabis - drugs - heroin - cocaine 27 36_drug_cannabis_drugs_heroin
37 church - pope - bishop - vatican - cardinal 27 37_church_pope_bishop_vatican
38 greek - greece - bailout - eurozone - bank 27 38_greek_greece_bailout_eurozone
39 nama - ireland - northern - cerberus - irish 26 39_nama_ireland_northern_cerberus
40 prison - prisoner - prisons - justice - turing 25 40_prison_prisoner_prisons_justice
41 radio - show - bbc - series - programme 24 41_radio_show_bbc_series
42 fifa - blatter - platini - fifas - football 23 42_fifa_blatter_platini_fifas
43 tesco - sale - store - supermarket - customer 23 43_tesco_sale_store_supermarket
44 china - taiwan - chinese - hong - taiwans 22 44_china_taiwan_chinese_hong
45 afghan - taliban - afghanistan - mansour - mullah 22 45_afghan_taliban_afghanistan_mansour
46 council - local - funding - government - authority 22 46_council_local_funding_government
47 nsa - encryption - cia - snowden - us 21 47_nsa_encryption_cia_snowden
48 ice - glacier - temperature - ocean - climate 21 48_ice_glacier_temperature_ocean
49 osullivan - world - snooker - beat - champion 21 49_osullivan_world_snooker_beat
50 book - prize - novel - author - award 20 50_book_prize_novel_author
51 auschwitz - jews - holocaust - camp - winton 20 51_auschwitz_jews_holocaust_camp
52 samsung - apple - phone - company - battery 19 52_samsung_apple_phone_company
53 picture - image - pictures - please - submit 19 53_picture_image_pictures_please
54 korea - north - korean - missile - koreas 19 54_korea_north_korean_missile
55 pension - worker - pay - work - hour 19 55_pension_worker_pay_work
56 pen - fillon - le - macron - mr 18 56_pen_fillon_le_macron
57 paris - eaw - french - attack - suspect 18 57_paris_eaw_french_attack
58 content - app - tv - digital - apple 18 58_content_app_tv_digital
59 israel - israeli - palestinians - palestinian - gaza 17 59_israel_israeli_palestinians_palestinian
60 housing - affordable - rent - homelessness - government 17 60_housing_affordable_rent_homelessness
61 prince - queen - birthday - duke - royal 17 61_prince_queen_birthday_duke
62 australia - australian - asylum - visa - abbott 15 62_australia_australian_asylum_visa
63 tax - spending - cut - osborne - fiscal 15 63_tax_spending_cut_osborne
64 updated - 2017 - bst - last - gmt 14 64_updated_2017_bst_last
65 refugee - uk - child - vulnerable - refugees 12 65_refugee_uk_child_vulnerable
66 ebola - sierra - leone - outbreak - liberia 12 66_ebola_sierra_leone_outbreak
67 shah - ahmed - mosque - muslims - prophet 11 67_shah_ahmed_mosque_muslims
68 broadband - 4g - ee - customer - internet 11 68_broadband_4g_ee_customer
69 pistorius - steenkamp - toilet - door - reeva 10 69_pistorius_steenkamp_toilet_door
70 eu - uk - population - migrant - trade 9 70_eu_uk_population_migrant
71 australia - marriage - turnbull - katter - samesex 9 71_australia_marriage_turnbull_katter
72 sugar - gin - sabmiller - inbev - ab 8 72_sugar_gin_sabmiller_inbev
73 suu - kyi - rohingya - rakhine - myanmar 8 73_suu_kyi_rohingya_rakhine
74 nadeau - field - aircraft - cordon - accidents 8 74_nadeau_field_aircraft_cordon
75 abortion - ireland - law - unborn - case 8 75_abortion_ireland_law_unborn
76 homosexuality - tor - homosexual - law - gay 7 76_homosexuality_tor_homosexual_law
77 castro - cuba - cuban - fidel - havana 7 77_castro_cuba_cuban_fidel
78 china - samsung - firm - business - cheil 7 78_china_samsung_firm_business
79 event - festival - technology - campsite - interactive 6 79_event_festival_technology_campsite
80 vw - volkswagen - production - emission - carmaker 6 80_vw_volkswagen_production_emission
81 mohammed - gjolla - sheriff - nca - terrorism 6 81_mohammed_gjolla_sheriff_nca
82 tb - tuberculosis - disease - badger - zoonotic 5 82_tb_tuberculosis_disease_badger

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.