Edit model card

xsum_123_3000_1500_train

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_123_3000_1500_train")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 47
  • Number of training documents: 3000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - police - people - would 5 -1_said_mr_police_people
0 win - game - half - foul - league 1132 0_win_game_half_foul
1 eu - labour - party - would - uk 591 1_eu_labour_party_would
2 athlete - sport - gold - olympic - medal 149 2_athlete_sport_gold_olympic
3 nhs - health - care - patient - hospital 104 3_nhs_health_care_patient
4 growth - price - market - sale - economy 84 4_growth_price_market_sale
5 president - mr - government - maduro - rousseff 71 5_president_mr_government_maduro
6 crash - police - hospital - road - driver 58 6_crash_police_hospital_road
7 murray - match - set - tennis - seed 46 7_murray_match_set_tennis
8 syrian - us - syria - rebel - force 45 8_syrian_us_syria_rebel
9 school - education - pupil - schools - child 41 9_school_education_pupil_schools
10 animal - zoo - wildlife - bird - specie 40 10_animal_zoo_wildlife_bird
11 film - actor - star - series - drama 38 11_film_actor_star_series
12 abuse - court - sexual - police - victim 38 12_abuse_court_sexual_police
13 trump - mr - clinton - republican - president 31 13_trump_mr_clinton_republican
14 fire - blaze - building - service - firefighters 31 14_fire_blaze_building_service
15 suu - party - mr - government - election 29 15_suu_party_mr_government
16 china - korea - chinese - south - north 29 16_china_korea_chinese_south
17 album - band - song - music - best 25 17_album_band_song_music
18 ms - heard - court - death - said 24 18_ms_heard_court_death
19 wales - welsh - said - train - government 23 19_wales_welsh_said_train
20 road - police - death - seen - found 23 20_road_police_death_seen
21 passenger - crew - sea - boat - aircraft 23 21_passenger_crew_sea_boat
22 russian - ukraine - russia - mr - ukrainian 22 22_russian_ukraine_russia_mr
23 fight - joshua - title - khan - boxing 22 23_fight_joshua_title_khan
24 samsung - phone - app - android - user 20 24_samsung_phone_app_android
25 earthquake - particle - nepal - building - mars 19 25_earthquake_particle_nepal_building
26 highways - traffic - dartford - council - road 18 26_highways_traffic_dartford_council
27 vettel - hamilton - lap - race - alonso 18 27_vettel_hamilton_lap_race
28 park - building - visitor - festival - visitscotland 16 28_park_building_visitor_festival
29 site - council - street - project - plan 15 29_site_council_street_project
30 abdeslam - paris - attack - belgian - salah 15 30_abdeslam_paris_attack_belgian
31 virus - ebola - disease - hiv - sierra 14 31_virus_ebola_disease_hiv
32 security - data - attack - cyber - malware 14 32_security_data_attack_cyber
33 dog - dogs - stray - pet - owner 14 33_dog_dogs_stray_pet
34 birdie - pga - bogey - woods - open 13 34_birdie_pga_bogey_woods
35 man - police - wearing - incident - anyone 13 35_man_police_wearing_incident
36 energy - pipeline - waste - renewables - electricity 13 36_energy_pipeline_waste_renewables
37 silence - bishop - belfast - people - attended 11 37_silence_bishop_belfast_people
38 painting - art - work - artist - exhibition 11 38_painting_art_work_artist
39 eyre - gaunt - lyttle - peter - court 10 39_eyre_gaunt_lyttle_peter
40 crime - police - force - constable - chief 9 40_crime_police_force_constable
41 flood - river - rain - louisiana - flooded 9 41_flood_river_rain_louisiana
42 charity - abuse - yentob - porn - batmanghelidjh 7 42_charity_abuse_yentob_porn
43 india - nidar - gun - yrf - film 6 43_india_nidar_gun_yrf
44 driving - stirling - winn - fraser - road 6 44_driving_stirling_winn_fraser
45 boko - haram - shekau - militant - monguno 5 45_boko_haram_shekau_militant

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
2