Edit model card

ArXiv

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("OSN2/ArXiv")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 171
  • Number of training documents: 12693
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 the - and - to - of - in 15 -1_the_and_to_of
0 recipe - food - recipes - pizza - salad 3814 0_recipe_food_recipes_pizza
1 trump - election - law - the - that 849 1_trump_election_law_the
2 anaysa - fashion - pants - swimwear - sneakers 393 2_anaysa_fashion_pants_swimwear
3 arsenal - liverpool - rugby - match - haaland 382 3_arsenal_liverpool_rugby_match
4 weather - bengal - storm - west - snow 271 4_weather_bengal_storm_west
5 crypto - bitcoin - cryptocurrency - gaming - trading 172 5_crypto_bitcoin_cryptocurrency_gaming
6 her - she - was - on - related 143 6_her_she_was_on
7 420m - dog - animal - animals - dogs 138 7_420m_dog_animal_animals
8 god - lord - prayer - jesus - church 127 8_god_lord_prayer_jesus
9 cars - sale - used - under - for 119 9_cars_sale_used_under
10 pro - vivo - v23 - phone - google 117 10_pro_vivo_v23_phone
11 news - iptv - tv - interview - latest 110 11_news_iptv_tv_interview
12 art - museum - artists - artist - of 108 12_art_museum_artists_artist
13 my - nephews - nieces - poetry - love 107 13_my_nephews_nieces_poetry
14 film - review - his - as - but 102 14_film_review_his_as
15 bike - helmet - bikes - mountain - pilots 98 15_bike_helmet_bikes_mountain
16 hair - bite - steel - care - haircut 97 16_hair_bite_steel_care
17 police - rhonda - mcdowell - was - said 90 17_police_rhonda_mcdowell_was
18 property - room - bedrooms - bedroom - home 86 18_property_room_bedrooms_bedroom
19 ukraine - russia - russian - putin - news 86 19_ukraine_russia_russian_putin
20 business - jobs - income - data - part 86 20_business_jobs_income_data
21 vaccinated - vaccine - covid - va - unvaccinated 84 21_vaccinated_vaccine_covid_va
22 music - band - students - orchestra - tickets 83 22_music_band_students_orchestra
23 workout - abs - workouts - fitness - exercise 83 23_workout_abs_workouts_fitness
24 school - teachers - dmc - 804 - children 83 24_school_teachers_dmc_804
25 women - robotics - bali - spanish - lutheran 82 25_women_robotics_bali_spanish
26 lima - tourism - parks - urban - our 79 26_lima_tourism_parks_urban
27 godzilla - movies - movie - spider - marvel 77 27_godzilla_movies_movie_spider
28 fishing - backpacks - fish - packs - swimming 74 28_fishing_backpacks_fish_packs
29 yoga - stretching - kru - nidra - oct 74 29_yoga_stretching_kru_nidra
30 researchers - species - of - the - university 72 30_researchers_species_of_the
31 wholesale - market - saree - delhi - software 71 31_wholesale_market_saree_delhi
32 skin - acne - cream - blackheads - whitening 70 32_skin_acne_cream_blackheads
33 rodents - pets - pest - dogs - animals 70 33_rodents_pets_pest_dogs
34 books - book - salinger - fiction - literary 67 34_books_book_salinger_fiction
35 class - pst - exams - preparation - test 66 35_class_pst_exams_preparation
36 5g - airlines - bsnl - flight - network 64 36_5g_airlines_bsnl_flight
37 treetops - dementia - children - people - barbara 62 37_treetops_dementia_children_people
38 lottery - thai - thailand - lotto - win 62 38_lottery_thai_thailand_lotto
39 wedding - weddings - survival - gift - day 61 39_wedding_weddings_survival_gift
40 quantum - solar - energy - material - light 61 40_quantum_solar_energy_material
41 beauty - makeup - products - sephora - skin 60 41_beauty_makeup_products_sephora
42 games - xbox - game - solitaire - free 60 42_games_xbox_game_solitaire
43 insurance - insurers - insurer - company - aig 59 43_insurance_insurers_insurer_company
44 green - saf - haiti - industry - solar 58 44_green_saf_haiti_industry
45 diet - meat - foods - plant - body 55 45_diet_meat_foods_plant
46 edinburgh - tour - royal - travel - castle 55 46_edinburgh_tour_royal_travel
47 horses - horse - friesian - goëngamieden - post 54 47_horses_horse_friesian_goëngamieden
48 your - you - mental - health - anal 51 48_your_you_mental_health
49 weight - obesity - loss - lose - fat 51 49_weight_obesity_loss_lose
50 estate - real - property - home - you 50 50_estate_real_property_home
51 camping - surfing - guess - landmark - lego 50 51_camping_surfing_guess_landmark
52 dorm - sex - birthday - my - joy 50 52_dorm_sex_birthday_my
53 covid - 19 - vaccinated - vaccine - cases 50 53_covid_19_vaccinated_vaccine
54 spain - morocco - gas - energy - industry 49 54_spain_morocco_gas_energy
55 gardening - garden - grow - plants - fertilizer 49 55_gardening_garden_grow_plants
56 tenant - transport - apartments - department - condos 49 56_tenant_transport_apartments_department
57 cricket - england - engw - indw - vs 48 57_cricket_england_engw_indw
58 trump - election - party - votes - former 48 58_trump_election_party_votes
59 tesla - marine - electric - musk - ev 47 59_tesla_marine_electric_musk
60 surf - surfing - ski - swimming - lessons 47 60_surf_surfing_ski_swimming
61 disabled - disability - thailand - scholarship - scholarships 47 61_disabled_disability_thailand_scholarship
62 programming - udemy - svelte - language - courses 44 62_programming_udemy_svelte_language
63 diy - ideas - desk - wood - woodworking 43 63_diy_ideas_desk_wood
64 wrestling - pearson - tiga - wwe - nfl 43 64_wrestling_pearson_tiga_wwe
65 smart - gadgets - appliances - home - kitchen 42 65_smart_gadgets_appliances_home
66 experiments - fu - kung - xxxtentacion - copyright 40 66_experiments_fu_kung_xxxtentacion
67 job - small - businesses - hiring - business 40 67_job_small_businesses_hiring
68 hiv - health - care - hospital - hospice 40 68_hiv_health_care_hospital
69 he - was - it - empire - movie 38 69_he_was_it_empire
70 beat - type - ringtone - lofi - beats 37 70_beat_type_ringtone_lofi
71 castellvi - marines - marine - corps - county 37 71_castellvi_marines_marine_corps
72 casino - xbox - game - games - poker 37 72_casino_xbox_game_games
73 bellanaijaweddings - bride - handmadepaper - weddingplanner - makeup 36 73_bellanaijaweddings_bride_handmadepaper_weddingplanner
74 music - jsem - bushcraft - se - festival 36 74_music_jsem_bushcraft_se
75 gemini - tarot - horoscope - september - pisces 35 75_gemini_tarot_horoscope_september
76 career - husni - magazines - magazine - employees 35 76_career_husni_magazines_magazine
77 his - film - movie - review - but 34 77_his_film_movie_review
78 gps - aircraft - trucks - vehicles - electric 34 78_gps_aircraft_trucks_vehicles
79 raya - merch - magazines - cards - kongamidyearshoppingfestival 34 79_raya_merch_magazines_cards
80 baby - she - birth - says - women 34 80_baby_she_birth_says
81 covid - 19 - uk - health - interventions 33 81_covid_19_uk_health
82 climate - gore - dm - eastman - change 33 82_climate_gore_dm_eastman
83 buhari - anambra - apc - anyim - chief 32 83_buhari_anambra_apc_anyim
84 orchestra - hotel - janice - chicago - symphony 31 84_orchestra_hotel_janice_chicago
85 ramen - pierre - soulz - magic - westfieldcarousel 31 85_ramen_pierre_soulz_magic
86 interior - design - home - decorate - bedroom 30 86_interior_design_home_decorate
87 hindi - movie - explained - hollywood - lankybox 30 87_hindi_movie_explained_hollywood
88 xbox - playstation - game - card - console 30 88_xbox_playstation_game_card
89 insurance - car - policy - feener - policyworld 30 89_insurance_car_policy_feener
90 share - nepal - stock - market - analysis 29 90_share_nepal_stock_market
91 marketing - content - strategy - cart - your 28 91_marketing_content_strategy_cart
92 songs - kids - song - rhymes - hindi 28 92_songs_kids_song_rhymes
93 tax - cd - money - itr - 401 27 93_tax_cd_money_itr
94 inflation - housing - prices - chorley - hydrow 27 94_inflation_housing_prices_chorley
95 venkat - spectre - spending - attacks - intel 26 95_venkat_spectre_spending_attacks
96 band - grammys - recording - musical - doo 26 96_band_grammys_recording_musical
97 drawing - draw - art - mandala - painting 26 97_drawing_draw_art_mandala
98 shop - insurance - design - restaurant - food 26 98_shop_insurance_design_restaurant
99 kamran - feride - iqiyi - drama - selim 26 99_kamran_feride_iqiyi_drama
100 poetry - prize - mondaymotivation - publication - apologize 26 100_poetry_prize_mondaymotivation_publication
101 jobs - tcs - part - job - work 25 101_jobs_tcs_part_job
102 card - credit - rewards - cash - tracking 25 102_card_credit_rewards_cash
103 vlog - vlogs - dexerto - video - blog 25 103_vlog_vlogs_dexerto_video
104 brother - 5½ - burge - poetry - thank 25 104_brother_5½_burge_poetry
105 anime - manga - disney - animes - recap 25 105_anime_manga_disney_animes
106 fox - news - msnbc - biden - business 25 106_fox_news_msnbc_biden
107 thoreau - wildness - maldives - malé - wildlife 24 107_thoreau_wildness_maldives_malé
108 condo - minutes - rent - condominium - เช 24 108_condo_minutes_rent_condominium
109 freshworks - sales - requirements - job - development 24 109_freshworks_sales_requirements_job
110 insurance - management - property - company - loans 24 110_insurance_management_property_company
111 aew - wrestling - highlights - esports - impact 23 111_aew_wrestling_highlights_esports
112 ctv - cbc - x000d - news - bridge 23 112_ctv_cbc__x000d__news
113 ukrainian - music - lyatoshynsky - solos - concert 23 113_ukrainian_music_lyatoshynsky_solos
114 abc - ladzinski - campaign - carlton - news 23 114_abc_ladzinski_campaign_carlton
115 gaming - pc - headset - byte - cosmic 23 115_gaming_pc_headset_byte
116 climate - environmental - noaa - literacy - education 23 116_climate_environmental_noaa_literacy
117 game - players - sonic - its - the 22 117_game_players_sonic_its
118 olympic - olympics - chen - biles - medal 22 118_olympic_olympics_chen_biles
119 loans - loan - student - paying - naira 22 119_loans_loan_student_paying
120 nail - art - nails - compilation - acrylic 22 120_nail_art_nails_compilation
121 peppa - pig - wolfoo - nguyen - favorite 21 121_peppa_pig_wolfoo_nguyen
122 jazz - music - blues - heat - waves 21 122_jazz_music_blues_heat
123 rónán - march - composer - lyricist - tickets 21 123_rónán_march_composer_lyricist
124 olympic - beijing - olympics - china - athletes 21 124_olympic_beijing_olympics_china
125 smoking - breakover - smokers - heart - hind 21 125_smoking_breakover_smokers_heart
126 pets - animals - pet - panda - dog 21 126_pets_animals_pet_panda
127 cycling - gcn - bike - feroce - wheels 21 127_cycling_gcn_bike_feroce
128 musique - proposée - libre - par - la 21 128_musique_proposée_libre_par
129 male - girlfriend - roseanne - unagi - twohill 20 129_male_girlfriend_roseanne_unagi
130 gymnastics - moana - always - drugs - week 20 130_gymnastics_moana_always_drugs
131 musk - gambling - twitter - elon - deduction 20 131_musk_gambling_twitter_elon
132 lichfield - google - sat - stoke - mon 20 132_lichfield_google_sat_stoke
133 reasonable - greenhouse - accommodation - robots - ai 20 133_reasonable_greenhouse_accommodation_robots
134 icebox - maxo - theme - kream - koo 19 134_icebox_maxo_theme_kream
135 whio - ong - ang - canal - birds 19 135_whio_ong_ang_canal
136 codyfight - tattooing - brothers - marriage - extreme 19 136_codyfight_tattooing_brothers_marriage
137 nuro - gm - vehicle - vehicles - electric 19 137_nuro_gm_vehicle_vehicles
138 kcs - railroads - cn - rail - stb 19 138_kcs_railroads_cn_rail
139 strengths - music - grief - leisure - life 19 139_strengths_music_grief_leisure
140 drones - drone - uae - missile - dhabi 19 140_drones_drone_uae_missile
141 massage - dubai - jumeirah - japanese - oil 18 141_massage_dubai_jumeirah_japanese
142 bowl - super - bengals - bet - rams 18 142_bowl_super_bengals_bet
143 pension - 9news - pensions - pay - tax 18 143_pension_9news_pensions_pay
144 dog - toy - pet - supplies - toys 18 144_dog_toy_pet_supplies
145 english - travellers - students - course - syllabus 18 145_english_travellers_students_course
146 mentoring - cbs - mentor - mentors - teachers 18 146_mentoring_cbs_mentor_mentors
147 picnic - park - blankets - basket - acompañantes 18 147_picnic_park_blankets_basket
148 orig - 99 - amazon - prime - dollar 18 148_orig_99_amazon_prime
149 primary - english - genetics - wilanów - education 18 149_primary_english_genetics_wilanów
150 hardin - film - he - she - oscar 17 150_hardin_film_he_she
151 laptop - gaming - alienware - laptops - hp 17 151_laptop_gaming_alienware_laptops
152 ufc - tmz - owens - onlyfans - tonight 17 152_ufc_tmz_owens_onlyfans
153 basketball - vs - varsity - darien - canaan 17 153_basketball_vs_varsity_darien
154 workers - hanford - state - law - doe 17 154_workers_hanford_state_law
155 cdl - freight - broker - logistics - eldt 17 155_cdl_freight_broker_logistics
156 builders - connell - brenton - firm - wage 17 156_builders_connell_brenton_firm
157 bookstore - easter - my - menger - eastershelfie 16 157_bookstore_easter_my_menger
158 prince - royal - duke - charles - queen 16 158_prince_royal_duke_charles
159 ดตามเราได - จำก - มหาชน - voicetv - oppday 16 159_ดตามเราได_จำก_มหาชน_voicetv
160 nba - trades - stream - espn - live 16 160_nba_trades_stream_espn
161 school - students - science - brandon - twig 16 161_school_students_science_brandon
162 morning - sleep - your - kaplan - routine 16 162_morning_sleep_your_kaplan
163 kat - author - desires - louise - charmaine 16 163_kat_author_desires_louise
164 movie - recapped - uche - academia - dizzyeight 16 164_movie_recapped_uche_academia
165 awka - religion - suspects - anambra - echeng 15 165_awka_religion_suspects_anambra
166 wrc - f1 - rally - championship - formula1 15 166_wrc_f1_rally_championship
167 hillstream - algae - scape - goby - aquarium 15 167_hillstream_algae_scape_goby
168 skin - filler - touche - éclat - dermal 15 168_skin_filler_touche_éclat
169 pets - cats - hopkins - cat - niblo 15 169_pets_cats_hopkins_cat

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.5
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.36.0
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
4