Edit model card

moderation-topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("jaimevera1107/moderation-topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 94
  • Number of training documents: 1403
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
0 suicide - nssi - tendency - recent - self 40 0_suicide_nssi_tendency_recent
1 exposed - minimal - sexualized - possessing - performs 33 1_exposed_minimal_sexualized_possessing
2 drug - reference - purposes - substances - substance 32 2_drug_reference_purposes_substances
3 regulated - consumption - tobacco - relate - associate 31 3_regulated_consumption_tobacco_relate
4 male - region - pubic - exposure - nipple 31 4_male_region_pubic_exposure
5 testing - wildlife - endangered - poaching - hunting 31 5_testing_wildlife_endangered_poaching
6 nudity - fine - implied - documentaries - indigenous 30 6_nudity_fine_implied_documentaries
7 text - language - pickup - textual - texts 28 7_text_language_pickup_textual
8 fighting - incitement - violent - reactive - event 27 8_fighting_incitement_violent_reactive
9 hate - ideology - hateful - based - disability 27 9_hate_ideology_hateful_based
10 sensual - pleasure - demonstration - objectification - dialogue 26 10_sensual_pleasure_demonstration_objectification
11 detailing - stimulation - fetishism - allusion - adults 26 11_detailing_stimulation_fetishism_allusion
12 pornography - vulgarity - website - tapes - softcore 26 12_pornography_vulgarity_website_tapes
13 lead - highly - is - imitable - professionals 25 13_lead_highly_is_imitable
14 brand - code - csam - qr - multiple 25 14_brand_code_csam_qr
15 expressions - dance - performing - performances - express 24 15_expressions_dance_performing_performances
16 intellectual - copyright - copyrighted - stolen - cover 24 16_intellectual_copyright_copyrighted_stolen
17 slur - slurs - designation - remarks - status 24 17_slur_slurs_designation_remarks
18 undressing - striptease - process - panties - voyeuristic 23 18_undressing_striptease_process_panties
19 workplace - peeping - upskirting - tom - coercion 23 19_workplace_peeping_upskirting_tom
20 hostility - degradation - statement - discriminatory - characteristics 23 20_hostility_degradation_statement_discriminatory
21 low - quality - organic - host - grow 22 21_low_quality_organic_host
22 terrorist - terrorism - recruitment - organizations - international 21 22_terrorist_terrorism_recruitment_organizations
23 spam - jump - makeup - scary - scare 20 23_spam_jump_makeup_scary
24 firearms - ammunition - explosive - explosives - weapons 20 24_firearms_ammunition_explosive_explosives
25 culturally - appropriate - wear - protected - not 19 25_culturally_appropriate_wear_protected
26 disturbing - cannibalism - disgusting - coverage - anatomy 18 26_disturbing_cannibalism_disgusting_coverage
27 homicide - mutilated - death - accident - torture 18 27_homicide_mutilated_death_accident
28 privacy - invasion - surveillance - espionage - confidential 18 28_privacy_invasion_surveillance_espionage
29 age - requirement - signals - identifiers - admission 18 29_age_requirement_signals_identifiers
30 framing - gaze - angles - piercings - camera 17 30_framing_gaze_angles_piercings
31 stalking - doxing - lists - encourage - addresses 17 31_stalking_doxing_lists_encourage
32 damage - destruction - property - arson - vandalism 17 32_damage_destruction_property_arson
33 eating - disorders - disorder - eat - loss 16 33_eating_disorders_disorder_eat
34 bullying - statements - cyberbullying - vulnerable - users 16 34_bullying_statements_cyberbullying_vulnerable
35 scams - frauds - scamming - schemes - fraudulent 16 35_scams_frauds_scamming_schemes
36 criminal - crime - criminals - gang - burglary 15 36_criminal_crime_criminals_gang
37 identifiable - data - personally - reveal - others 15 37_identifiable_data_personally_reveal
38 work - sex - prostitution - workers - escort 15 38_work_sex_prostitution_workers
39 conspiracy - theories - disinformation - baseless - current 14 39_conspiracy_theories_disinformation_baseless
40 consensual - recording - blackmail - intention - displaying 14 40_consensual_recording_blackmail_intention
41 child - featuring - pedophilic - defense - intimate 14 41_child_featuring_pedophilic_defense
42 polarization - opposing - social - incite - deepen 14 42_polarization_opposing_social_incite
43 pedophilia - grooming - normalization - predators - normalizing 14 43_pedophilia_grooming_normalization_predators
44 platforms - direction - ads - third - party 14 44_platforms_direction_ads_third
45 products - items - enhancement - grafitication - demonstrations 13 45_products_items_enhancement_grafitication
46 possession - consuming - drinking - tobacco - smoking 13 46_possession_consuming_drinking_tobacco
47 credible - threats - menacing - aggressive - plans 12 47_credible_threats_menacing_aggressive
48 hacking - malware - phishing - ransomware - hacks 12 48_hacking_malware_phishing_ransomware
49 proxy - lgbtq - bully - harassment - trolling 12 49_proxy_lgbtq_bully_harassment
50 going - live - 13 - 18 - u18 12 50_going_live_13_18
51 unintentionally - genitalia - animals - pornographic - bestiality 12 51_unintentionally_genitalia_animals_pornographic
52 artificial - traffic - way - methods - generate 12 52_artificial_traffic_way_methods
53 slaughter - mutilation - humans - dead - animal 12 53_slaughter_mutilation_humans_dead
54 goods - gangs - organized - counterfeit - illicit 11 54_goods_gangs_organized_counterfeit
55 gambling - betting - cheating - game - devices 11 55_gambling_betting_cheating_game
56 trafficking - forced - coerced - traded - function 11 56_trafficking_forced_coerced_traded
57 unsolicited - messages - favors - requests - advances 11 57_unsolicited_messages_favors_requests
58 blood - gore - shock - bloodshed - value 11 58_blood_gore_shock_bloodshed
59 victim - abduction - vehicle - motor - glorification 11 59_victim_abduction_vehicle_motor
60 inappropriate - kiss - sexualizes - objectifies - towards 10 60_inappropriate_kiss_sexualizes_objectifies
61 toddlers - infants - unintentional - touch - abdomen 10 61_toddlers_infants_unintentional_touch
62 traditional - traditions - sacred - cultural - misappropriation 10 62_traditional_traditions_sacred_cultural
63 nuclear - weapon - peaceful - advocating - energy 9 63_nuclear_weapon_peaceful_advocating
64 exploiting - child - marriage - exploitation - labor 9 64_exploiting_child_marriage_exploitation
65 impersonation - famous - figure - slandering - profiles 9 65_impersonation_famous_figure_slandering
66 defamation - someones - defamatory - allegations - businesses 9 66_defamation_someones_defamatory_allegations
67 recipes - creating - may - tools - instructions 9 67_recipes_creating_may_tools
68 election - interference - campaigns - misinformation - political 9 68_election_interference_campaigns_misinformation
69 claims - expertise - apocalypse - authority - media 9 69_claims_expertise_apocalypse_authority
70 featuring - nude - partial - implied - depictions 8 70_featuring_nude_partial_implied
71 operations - police - military - enforcement - law 8 71_operations_police_military_enforcement
72 tax - laundering - crimes - money - ponzi 8 72_tax_laundering_crimes_money
73 cosmetic - surgery - procedures - diy - unlicensed 8 73_cosmetic_surgery_procedures_diy
74 subject - optical - innuendos - illusion - suggestive 8 74_subject_optical_innuendos_illusion
75 bodies - fantasy - lifeless - accident - fictional 8 75_bodies_fantasy_lifeless_accident
76 controversial - constructive - politics - issues - discussion 7 76_controversial_constructive_politics_issues
77 kissing - lip - only - greeting - as 7 77_kissing_lip_only_greeting
78 pirated - plagiarism - incites - glorifies - first 7 78_pirated_plagiarism_incites_glorifies
79 mental - conditions - health - mocks - stigmatization 7 79_mental_conditions_health_mocks
80 daredevil - reckless - precautions - risking - caution 7 80_daredevil_reckless_precautions_risking
81 pranks - intentions - cybersecurity - harmful - targeted 7 81_pranks_intentions_cybersecurity_harmful
82 dark - web - underground - marketplaces - glorifies 6 82_dark_web_underground_marketplaces
83 vax - anti - medical - false - misinformation 6 83_vax_anti_medical_false
84 sports - danger - adventures - stunts - professional 6 84_sports_danger_adventures_stunts
85 environmental - pollution - experiments - ecosystems - natural 6 85_environmental_pollution_experiments_ecosystems
86 incest - incestuous - taboo - themes - discussion 5 86_incest_incestuous_taboo_themes
87 neglect - child - endangerment - abuse - physical 5 87_neglect_child_endangerment_abuse
88 radicalization - extremist - extremism - views - propaganda 5 88_radicalization_extremist_extremism_views
89 waste - bodily - excretion - unsanitary - images 5 89_waste_bodily_excretion_unsanitary
90 emotional - psychological - mind - gaslighting - relationships 5 90_emotional_psychological_mind_gaslighting
91 solicitation - offer - request - prostitution - act 5 91_solicitation_offer_request_prostitution
92 elderly - elders - elder - neglect - against 5 92_elderly_elders_elder_neglect
93 education - terms - term - relating - general 4 93_education_terms_term_relating

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.4
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.24.0
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
2,641