Edit model card

chat_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/chat_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 75
  • Number of training documents: 63530
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 provide - using - information - sure - help 26 -1_provide_using_information_sure
0 openai - ai - chatgpt - assistant - language 7837 Generative AI
1 anytime - welcome - assistance - helpful - thank 1342 1_anytime_welcome_assistance_helpful
2 quantum - particle - physics - particles - relativity 778 Physics
3 story - lived - life - novel - felt 569 3_story_lived_life_novel
4 letter - sincerely - regards - email - dear 516 4_letter_sincerely_regards_email
5 rust - haskell - programming - java - languages 504 programming
6 css - html - style - div - js 494 web programming
7 linux - ubuntu - debian - fedora - install 440 7_linux_ubuntu_debian_fedora
8 recipe - bake - ingredients - baking - dough 425 8_recipe_bake_ingredients_baking
9 websocket - json - socket - api - discord 425 9_websocket_json_socket_api
10 communism - capitalism - marx - economic - economy 424 10_communism_capitalism_marx_economic
11 dog - pet - breed - breeds - pets 408 11_dog_pet_breed_breeds
12 philosophy - theological - philosophical - beliefs - consciousness 394 12_philosophy_theological_philosophical_beliefs
13 git - github - repository - software - commit 381 13_git_github_repository_software
14 music - songs - musical - lyrics - song 370 14_music_songs_musical_lyrics
15 devops - development - developers - industry - develop 323 15_devops_development_developers_industry
16 pythagorean - hypotenuse - triangle - math - sqrt 302 16_pythagorean_hypotenuse_triangle_math
17 eu - europe - economy - economic - war 291 17_eu_europe_economy_economic
18 sleep - asleep - bedtime - procrastination - depression 280 18_sleep_asleep_bedtime_procrastination
19 kramer - seinfeld - jerry - cafe - elaine 279 19_kramer_seinfeld_jerry_cafe
20 printing - prints - printer - print - printers 276 20_printing_prints_printer_print
21 influenza - flu - panic - symptoms - medical 251 21_influenza_flu_panic_symptoms
22 chess - chessboard - practice - strategy - learn 242 22_chess_chessboard_practice_strategy
23 algorithm - primes - array - integers - python 240 23_algorithm_primes_array_integers
24 youtube - viewers - media - google - streaming 240 24_youtube_viewers_media_google
25 poison - chemicals - powder - turpentine - smoke 226 25_poison_chemicals_powder_turpentine
26 monday - sunday - count_weekend_days - calendar - dates 216 26_monday_sunday_count_weekend_days_calendar
27 colors - colour - color - pigments - blue 208 27_colors_colour_color_pigments
28 roman - attila - rome - empire - warfare 205 28_roman_attila_rome_empire
29 investing - investments - investment - stocks - financial 204 29_investing_investments_investment_stocks
30 vocabulary - wordle - words - scrabble - word 201 30_vocabulary_wordle_words_scrabble
31 planets - sun - earth - planet - pluto 198 31_planets_sun_earth_planet
32 renewable - solar - electricity - energy - electrical 190 32_renewable_solar_electricity_energy
33 pygame - ball_radius - draw - circle - canvas 181 33_pygame_ball_radius_draw_circle
34 fishing - fish - boat - hiking - camping 176 34_fishing_fish_boat_hiking
35 gpus - gpu - motherboard - cpu - hardware 162 35_gpus_gpu_motherboard_cpu
36 hvac - remodeling - energy - kwh - housing 159 36_hvac_remodeling_energy_kwh
37 database - graphql - databases - postgresql - sql 159 37_database_graphql_databases_postgresql
38 información - significado - cómo - como - sistemas 158 38_información_significado_cómo_como
39 motherboard - pcie - gpu - bios - computer 153 39_motherboard_pcie_gpu_bios
40 crops - produce - planting - peppers - plants 148 40_crops_produce_planting_peppers
41 paintings - art - modernist - artists - modern 148 41_paintings_art_modernist_artists
42 workout - exercises - dumbbells - dumbbell - exercise 147 42_workout_exercises_dumbbells_dumbbell
43 climate - warming - pollution - environmental - emissions 142 43_climate_warming_pollution_environmental
44 coffee - espresso - brewing - tea - beans 137 44_coffee_espresso_brewing_tea
45 velocity - drag - acceleration - density - formula 132 45_velocity_drag_acceleration_density
46 woodchuck - woodchucks - units - kilogram - kilograms 130 46_woodchuck_woodchucks_units_kilogram
47 ascii - glyphs - hiragana - art - font 129 47_ascii_glyphs_hiragana_art
48 guitars - guitar - strings - guitarists - instrument 127 48_guitars_guitar_strings_guitarists
49 tallest - buildings - building - burj - khalifa 114 49_tallest_buildings_building_burj
50 flat - earth - curvature - spherical - tectonic 111 50_flat_earth_curvature_spherical
51 essay - awareness - understanding - being - be 102 51_essay_awareness_understanding_being
52 portals - ender - portal - obsidian - netherite 102 52_portals_ender_portal_obsidian
53 android - apple - phones - devices - vehicles 101 53_android_apple_phones_devices
54 fasting - dietary - diet - eating - metabolic 101 54_fasting_dietary_diet_eating
55 meditation - relief - pain - health - nociception 99 55_meditation_relief_pain_health
56 weather - forecast - forecasts - raining - precipitation 95 56_weather_forecast_forecasts_raining
57 president - presidents - presidency - constitution - biden 94 57_president_presidents_presidency_constitution
58 no - nope - yes - not - maybe 94 58_no_nope_yes_not
59 peregrine - airspeed - falcon - speed - bird 90 59_peregrine_airspeed_falcon_speed
60 crontab - cron - myscript - script - bash 83 60_crontab_cron_myscript_script
61 youtuber - streamer - ceo - musk - founder 83 61_youtuber_streamer_ceo_musk
62 layovers - flights - circumnavigate - layover - travel 83 62_layovers_flights_circumnavigate_layover
63 keyboards - keyboard - switches - qwerty - types 83 63_keyboards_keyboard_switches_qwerty
64 file_path_in_dir1 - file_path1 - csv_file - file_path_in_dir2 - file_path2 80 64_file_path_in_dir1_file_path1_csv_file_file_path_in_dir2
65 pele - maradona - lebron - ronaldo - nba 76 65_pele_maradona_lebron_ronaldo
66 alopecia - hairstyles - hairstyle - hair - scalp 66 66_alopecia_hairstyles_hairstyle_hair
67 nginx - docker - kubernetes - proxy_pass - nodeport 65 67_nginx_docker_kubernetes_proxy_pass
68 directories - directory - sudo - filesystem - folders 62 68_directories_directory_sudo_filesystem
69 gps - map - geocaching - maps - armenia 52 69_gps_map_geocaching_maps
70 meiosis - mitosis - fertilization - reproduction - ovulation 51 70_meiosis_mitosis_fertilization_reproduction
71 colleges - admissions - universities - campus - university 43 71_colleges_admissions_universities_campus
72 unicorns - unicorn - pony - ponies - mythical 32 72_unicorns_unicorn_pony_ponies
73 superpowers - abilities - superhero - superhuman - powers 28 73_superpowers_abilities_superhero_superhuman

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 20
  • n_gram_range: (1, 1)
  • nr_topics: 75
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11
Downloads last month
5,001
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train davanstrien/chat_topics

Spaces using davanstrien/chat_topics 4