Edit model card

blbooksgenre_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/blbooksgenre_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 57
  • Number of training documents: 43752
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 poems - novel - poem - prose - book 11 -1_poems_novel_poem_prose
0 poems - poem - poetry - poets - poetical 18624 0_poems_poem_poetry_poets
1 novel - author - poem - heir - tales 4698 1_novel_author_poem_heir
2 ireland - dublin - scotland - irish - edinburgh 3576 2_ireland_dublin_scotland_irish
3 geography - geographical - maps - map - history 3104 3_geography_geographical_maps_map
4 shakespeare - acts - prose - comedy - theatre 1377 4_shakespeare_acts_prose_comedy
5 county - counties - pennsylvania - hampshire - history 1089 5_county_counties_pennsylvania_hampshire
6 france - spain - europe - pyrenees - paris 990 6_france_spain_europe_pyrenees
7 sailing - nautical - maritime - boat - voyages 986 7_sailing_nautical_maritime_boat
8 antiquity - greeks - rome - romans - greece 744 8_antiquity_greeks_rome_romans
9 illustrations - drawings - pencil - drawn - sketches 631 9_illustrations_drawings_pencil_drawn
10 africa - transvaal - cape - zululand - african 610 10_africa_transvaal_cape_zululand
11 egypt - egyptians - cairo - sinai - egyptian 610 11_egypt_egyptians_cairo_sinai
12 england - britain - british - george - english 570 12_england_britain_british_george
13 california - alaska - regions - tour - states 546 13_california_alaska_regions_tour
14 italia - italy - sicily - italian - italians 491 14_italia_italy_sicily_italian
15 crimean - crimea - turkey - turks - russia 481 15_crimean_crimea_turkey_turks
16 mexico - rio - honduras - colombia - panama 433 16_mexico_rio_honduras_colombia
17 wales - maoriland - otago - zealand - auckland 423 17_wales_maoriland_otago_zealand
18 waterloo - poem - battle - napoleon - battles 405 18_waterloo_poem_battle_napoleon
19 mining - mineralogy - minerals - metallurgy - metals 396 19_mining_mineralogy_minerals_metallurgy
20 history - america - states - historical - american 377 20_history_america_states_historical
21 geology - geological - geologists - cambrian - fossils 305 21_geology_geological_geologists_cambrian
22 quebec - scotia - canadas - ontario - province 204 22_quebec_scotia_canadas_ontario
23 rambles - ramble - south - lands - scrambles 194 23_rambles_ramble_south_lands
24 edition - second - series - third - revised 159 24_edition_second_series_third
25 rudge - barnaby - hutton - rivers - osborne 149 25_rudge_barnaby_hutton_rivers
26 memorials - anniversary - memorial - london - address 134 26_memorials_anniversary_memorial_london
27 railway - railways - railroad - railroads - railroadiana 115 27_railway_railways_railroad_railroads
28 forest - foresters - woods - trees - forestalled 112 28_forest_foresters_woods_trees
29 philosophy - humanity - philosophie - moralities - conscience 97 29_philosophy_humanity_philosophie_moralities
30 gazetteer - geography - geographical - dictionary - topographical 96 30_gazetteer_geography_geographical_dictionary
31 goldsmith - goldsmiths - novel - writings - epistle 93 31_goldsmith_goldsmiths_novel_writings
32 regulations - members - committees - rules - committee 89 32_regulations_members_committees_rules
33 odes - poems - poem - ode - hymno 87 33_odes_poems_poem_ode
34 doctor - doctors - physician - patients - physicians 79 34_doctor_doctors_physician_patients
35 geography - schools - longmans - colleges - school 77 35_geography_schools_longmans_colleges
36 juan - juana - sequel - carlos - genista 63 36_juan_juana_sequel_carlos
37 sporting - sports - sport - sportsmans - rugby 56 37_sporting_sports_sport_sportsmans
38 detective - detectives - crime - policeman - city 52 38_detective_detectives_crime_policeman
39 blanc - mont - blanche - montserrat - montacute 47 39_blanc_mont_blanche_montserrat
40 jack - jacks - jackdaw - house - author 46 40_jack_jacks_jackdaw_house
41 dutch - netherlands - holland - dutchman - dutchesse 43 41_dutch_netherlands_holland_dutchman
42 spider - spiders - adventure - web - webs 35 42_spider_spiders_adventure_web
43 madrasiana - madras - malabar - mysore - district 31 43_madrasiana_madras_malabar_mysore
44 doncaster - 1835 - gazette - 1862 - 1868 31 44_doncaster_1835_gazette_1862
45 lays - lay - land - empire - sea 28 45_lays_lay_land_empire
46 cyprus - syria - palestine - island - asia 28 46_cyprus_syria_palestine_island
47 gipsies - gipsy - snakes - encyclopaedia - bunyan 20 47_gipsies_gipsy_snakes_encyclopaedia
48 abydos - bride - turkish - marriage - euphrosyne 18 48_abydos_bride_turkish_marriage
49 derby - castleton - buxton - matlock - nottingham 16 49_derby_castleton_buxton_matlock
50 corsair - tale - carlo - mystery - monte 16 50_corsair_tale_carlo_mystery
51 bushman - bushranger - bushrangers - australian - novel 13 51_bushman_bushranger_bushrangers_australian
52 months - italy - weeks - six - france 12 52_months_italy_weeks_six
53 kitty - kittys - catspaw - catriona - father 12 53_kitty_kittys_catspaw_catriona
54 lighthouses - lighthouse - beacons - lights - lighting 12 54_lighthouses_lighthouse_beacons_lights
55 balfour - kidnapped - balfouriana - memoirs - adventures 11 55_balfour_kidnapped_balfouriana_memoirs

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 57
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11
Downloads last month
4

Dataset used to train davanstrien/blbooksgenre_topics