Edit model card

xsum_55555_3000_1500_test

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_55555_3000_1500_test")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 26
  • Number of training documents: 1500
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - also - would - people 5 -1_said_mr_also_would
0 police - said - mr - court - heard 716 0_police_said_mr_court
1 syria - turkey - syrian - military - said 112 1_syria_turkey_syrian_military
2 foul - win - kick - half - shot 72 2_foul_win_kick_half
3 growth - year - bank - business - economy 68 3_growth_year_bank_business
4 council - said - building - development - new 63 4_council_said_building_development
5 england - cricket - captain - test - wicket 48 5_england_cricket_captain_test
6 league - club - season - loan - transfer 42 6_league_club_season_loan
7 sport - gold - world - athlete - olympic 38 7_sport_gold_world_athlete
8 film - music - best - star - song 36 8_film_music_best_star
9 party - labour - mr - leader - said 33 9_party_labour_mr_leader
10 ireland - wales - leinster - rugby - player 32 10_ireland_wales_leinster_rugby
11 care - nhs - hospital - patient - said 27 11_care_nhs_hospital_patient
12 road - crash - police - collision - car 26 12_road_crash_police_collision
13 dog - animal - greyhound - racing - owner 23 13_dog_animal_greyhound_racing
14 ship - beach - said - lifeguard - rnli 22 14_ship_beach_said_lifeguard
15 school - education - child - council - said 20 15_school_education_child_council
16 wales - bill - welsh - labour - assembly 19 16_wales_bill_welsh_labour
17 eu - uk - european - europe - referendum 18 17_eu_uk_european_europe
18 fire - blaze - bus - flame - said 18 18_fire_blaze_bus_flame
19 mr - president - besigye - maduro - election 16 19_mr_president_besigye_maduro
20 race - froome - stage - second - lap 13 20_race_froome_stage_second
21 rail - train - rmt - scotrail - transport 10 21_rail_train_rmt_scotrail
22 planet - earth - electron - theory - mars 10 22_planet_earth_electron_theory
23 ryder - cup - tour - pga - mcilroy 7 23_ryder_cup_tour_pga
24 email - lazar - fbi - guccifer - ferizi 6 24_email_lazar_fbi_guccifer

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
2