Edit model card

xsum_22457_3000_1500_validation

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_22457_3000_1500_validation")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 26
  • Number of training documents: 1500
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - people - would - one - year 5 -1_said_people_would_one
0 said - police - court - mr - heard 646 0_said_police_court_mr
1 labour - party - mr - scotland - vote 242 1_labour_party_mr_scotland
2 race - olympic - gold - team - medal 56 2_race_olympic_gold_team
3 president - un - mr - south - said 51 3_president_un_mr_south
4 united - foul - half - kick - win 48 4_united_foul_half_kick
5 price - bank - rose - share - said 44 5_price_bank_rose_share
6 attack - taliban - militant - killed - said 41 6_attack_taliban_militant_killed
7 care - health - nhs - hospital - patient 32 7_care_health_nhs_hospital
8 england - cricket - wicket - test - ball 27 8_england_cricket_wicket_test
9 specie - tiger - bird - said - breeding 27 9_specie_tiger_bird_said
10 rugby - wales - player - coach - world 27 10_rugby_wales_player_coach
11 celtic - league - season - game - rangers 26 11_celtic_league_season_game
12 album - music - song - show - singer 26 12_album_music_song_show
13 open - round - world - play - american 25 13_open_round_world_play
14 school - education - schools - said - child 24 14_school_education_schools_said
15 film - best - actor - star - actress 21 15_film_best_actor_star
16 eu - uk - brexit - trade - would 21 16_eu_uk_brexit_trade
17 data - us - internet - said - information 21 17_data_us_internet_said
18 league - transfer - season - club - appearance 20 18_league_transfer_season_club
19 parking - council - said - road - ringgo 19 19_parking_council_said_road
20 trump - mr - clinton - republican - president 15 20_trump_mr_clinton_republican
21 water - supply - affected - flooding - customer 12 21_water_supply_affected_flooding
22 fifa - corruption - scala - also - president 12 22_fifa_corruption_scala_also
23 testimonial - match - tevez - united - player 6 23_testimonial_match_tevez_united
24 hiv - outbreak - disease - kong - hong 6 24_hiv_outbreak_disease_kong

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12
Downloads last month
2