Edit model card

cnn_dailymail_108_3000_1500_test

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_3000_1500_test")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 29
  • Number of training documents: 1500
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - time - people 11 -1_said_one_year_time
0 police - said - told - school - court 384 0_police_said_told_school
1 game - liverpool - league - goal - season 249 1_game_liverpool_league_goal
2 said - attack - group - police - people 91 2_said_attack_group_police
3 people - said - planet - mountain - mile 85 3_people_said_planet_mountain
4 baby - family - mother - said - cancer 77 4_baby_family_mother_said
5 labour - mr - miliband - tax - leader 54 5_labour_mr_miliband_tax
6 shark - crocodile - fish - animal - water 49 6_shark_crocodile_fish_animal
7 chelsea - arsenal - mourinho - hazard - league 37 7_chelsea_arsenal_mourinho_hazard
8 united - manchester - city - van - league 36 8_united_manchester_city_van
9 masters - round - woods - group - tournament 31 9_masters_round_woods_group
10 model - fashion - dress - woman - look 30 10_model_fashion_dress_woman
11 food - sugar - restaurant - water - vitamin 29 11_food_sugar_restaurant_water
12 race - hamilton - rosberg - grand - prix 29 12_race_hamilton_rosberg_grand
13 madrid - ronaldo - real - goal - barcelona 29 13_madrid_ronaldo_real_goal
14 england - cricket - test - cook - benaud 28 14_england_cricket_test_cook
15 clinton - president - obama - hillary - said 27 15_clinton_president_obama_hillary
16 property - house - home - market - apartment 25 16_property_house_home_market
17 fight - mayweather - pacquiao - manny - bout 25 17_fight_mayweather_pacquiao_manny
18 apple - watch - price - per - cent 24 18_apple_watch_price_per
19 celtic - rangers - game - scottish - deila 24 19_celtic_rangers_game_scottish
20 dog - animal - owner - council - dogs 21 20_dog_animal_owner_council
21 prince - royal - harry - queen - baby 21 21_prince_royal_harry_queen
22 film - actor - downey - interview - show 17 22_film_actor_downey_interview
23 hotel - flight - mile - island - room 16 23_hotel_flight_mile_island
24 bayern - guardiola - porto - dortmund - munich 14 24_bayern_guardiola_porto_dortmund
25 wedding - gabriel - noah - roxy - sandra 13 25_wedding_gabriel_noah_roxy
26 saracens - bosch - penalty - kick - rugby 12 26_saracens_bosch_penalty_kick
27 deal - summer - club - interest - contract 12 27_deal_summer_club_interest

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
2