Edit model card

label_model_merged

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/label_model_merged")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 247
  • Number of training documents: 14986
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 pre - roll - heavy - farm - health 5 -1_pre_roll_heavy_farm
0 label_1 label_2 - label_0 label_1 label_2 - label_1 - label_0 label_1 - label_2 1386 0_label_1 label_2_label_0 label_1 label_2_label_1_label_0 label_1
1 label_1 label_2 label_3 - label_3 label_4 label_5 - label_4 label_5 - label_2 label_3 label_4 - label_5 1042 1_label_1 label_2 label_3_label_3 label_4 label_5_label_4 label_5_label_2 label_3 label_4
2 negative positive - positive negative - negative - positive - target 803 2_negative positive_positive negative_negative_positive
3 loc misc org - misc org - loc misc - misc - org loc 652 3_loc misc org_misc org_loc misc_misc
4 neutral positive - neutral - positive negative - negative - positive 509 4_neutral positive_neutral_positive negative_negative
5 label_0 - country - city - label_1 - label_0 label_1 357 5_label_0_country_city_label_1
6 contradiction - entailment - neutral - - 351 6_contradiction_entailment_neutral_
7 label_0 - positive - - - 335 7_label_0_positive__
8 99 - - - - 327 8_99___
9 label_1 label_2 label_3 - label_2 label_3 label_4 - label_3 label_4 - label_2 label_3 - label_4 302 9_label_1 label_2 label_3_label_2 label_3 label_4_label_3 label_4_label_2 label_3
10 entailment - true - child - related - non 257 10_entailment_true_child_related
11 terrier - snake - dog - bear - wolf 245 11_terrier_snake_dog_bear
12 loc misc org - loc misc - misc org - misc - org loc 240 12_loc misc org_loc misc_misc org_misc
13 label_5 label_6 label_7 - label_6 label_7 - label_4 label_5 label_6 - label_5 label_6 - label_7 231 13_label_5 label_6 label_7_label_6 label_7_label_4 label_5 label_6_label_5 label_6
14 calendar - greeting - weather - transfer - calculator 229 14_calendar_greeting_weather_transfer
15 label_1 label_2 label_3 - label_2 label_3 - label_3 - label_1 label_2 - label_0 label_1 label_2 226 15_label_1 label_2 label_3_label_2 label_3_label_3_label_1 label_2
16 delete - unrelated - bad - related - rel 207 16_delete_unrelated_bad_related
17 label_12 label_13 label_14 - label_11 label_12 label_13 - label_13 label_14 - label_12 label_13 - label_10 label_11 label_12 172 17_label_12 label_13 label_14_label_11 label_12 label_13_label_13 label_14_label_12 label_13
18 loc org - org loc - org - loc - loc loc 166 18_loc org_org loc_org_loc
19 left - right - stop - yes - zero 130 19_left_right_stop_yes
20 label_6 label_60 label_61 - label_60 label_61 - label_60 label_61 label_62 - label_62 label_63 - label_59 label_6 label_60 123 20_label_6 label_60 label_61_label_60 label_61_label_60 label_61 label_62_label_62 label_63
21 unrelated - - - - 117 21_unrelated___
22 forest - industrial - river - transport - disaster 110 22_forest_industrial_river_transport
23 label_4 label_5 label_6 - label_5 label_6 - label_6 - label_1 label_2 label_3 - label_3 label_4 label_5 107 23_label_4 label_5 label_6_label_5 label_6_label_6_label_1 label_2 label_3
24 question - quantity - - - 106 24_question_quantity__
25 healthy - leaf - rust - plant - mildew 103 25_healthy_leaf_rust_plant
26 disease - blood - bio - healthy - sexual 100 26_disease_blood_bio_healthy
27 work - group - corporation - person product - product 92 27_work_group_corporation_person product
28 surprise anger - sadness surprise - fear joy - anger fear - joy love 80 28_surprise anger_sadness surprise_fear joy_anger fear
29 duplicate - common - non - - 78 29_duplicate_common_non_
30 steak - hamburger - restaurant - pizza - joint 76 30_steak_hamburger_restaurant_pizza
31 room - service - transport - product - forest 74 31_room_service_transport_product
32 dis - - - - 74 32_dis___
33 - - - - 73 33____
34 loc org - org - date - loc - set 70 34_loc org_org_date_loc
35 label_17 label_18 label_19 - label_18 label_19 - label_18 label_19 label_2 - label_19 label_2 - label_16 label_17 label_18 70 35_label_17 label_18 label_19_label_18 label_19_label_18 label_19 label_2_label_19 label_2
36 03 - 02 - second - - 65 36_03_02_second_
37 anger fear - joy love - surprise - joy - love 65 37_anger fear_joy love_surprise_joy
38 real - true - image - news - 64 38_real_true_image_news
39 - - - - 63 39____
40 pos - neg - neu - - 62 40_pos_neg_neu_
41 45 - 30 - 55 - 35 - 10 61 41_45_30_55_35
42 ge - wifi - na - alpha - fan 61 42_ge_wifi_na_alpha
43 label_1 label_10 label_11 - label_10 label_11 - label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 61 43_label_1 label_10 label_11_label_10 label_11_label_8 label_9 label_0_label_9 label_0 label_1
44 event - group - corporation - person product - product 61 44_event_group_corporation_person product
45 label_19 label_2 label_20 - label_2 label_20 - label_20 - label_18 label_19 label_2 - label_18 label_19 60 45_label_19 label_2 label_20_label_2 label_20_label_20_label_18 label_19 label_2
46 fear happy - sad - happy - disgust fear - angry 58 46_fear happy_sad_happy_disgust fear
47 battery - volume - chinese - juice - socks 58 47_battery_volume_chinese_juice
48 prep - nn - bio - cc - pro 56 48_prep_nn_bio_cc
49 good - poor - ok - great - bad 56 49_good_poor_ok_great
50 date - city - fur - day - ar 54 50_date_city_fur_day
51 15 - 18 19 20 - 19 20 - 17 18 19 - 18 19 54 51_15_18 19 20_19 20_17 18 19
52 menu - price - num - - 52 52_menu_price_num_
53 common - fat - loose - small - sugar 52 53_common_fat_loose_small
54 append_ - replace_ - append_ append_ - replace_ replace_ - append_ append_ append_ 49 54_append__replace__append_ append__replace_ replace_
55 append_ - replace_ - append_ append_ - replace_ replace_ - append_ append_ append_ 48 55_append__replace__append_ append__replace_ replace_
56 animals - flying - tech - dance - tiger 48 56_animals_flying_tech_dance
57 self - question - neutral - yes - greeting 47 57_self_question_neutral_yes
58 mt - cv - tr - tm - drug 47 58_mt_cv_tr_tm
59 organization person - location organization - organization - location - person 46 59_organization person_location organization_organization_location
60 - - - - 45 60____
61 joy - anger - sadness - sad - happy 44 61_joy_anger_sadness_sad
62 daisy - tulip - rose - - 43 62_daisy_tulip_rose_
63 positive - negative - neutral - neutral positive - positive negative 42 63_positive_negative_neutral_neutral positive
64 windows - pm - 21 - office - 20 42 64_windows_pm_21_office
65 label_14 label_15 - label_13 label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 42 65_label_14 label_15_label_13 label_14 label_15_label_15_label_12 label_13 label_14
66 position - statement - lead - request - study 42 66_position_statement_lead_request
67 business - news - entertainment - tech - sport 41 67_business_news_entertainment_tech
68 hate - speech - language - reporting - non 41 68_hate_speech_language_reporting
69 bd - nan - id - bg - 41 69_bd_nan_id_bg
70 cream - burger - carrot - ice cream - salad 41 70_cream_burger_carrot_ice cream
71 human - machine - ai - artificial - art 40 71_human_machine_ai_artificial
72 open - high - tie - abstract - button 40 72_open_high_tie_abstract
73 label_23 label_24 label_25 - label_24 label_25 - label_22 label_23 label_24 - label_23 label_24 - label_21 label_22 label_23 40 73_label_23 label_24 label_25_label_24 label_25_label_22 label_23 label_24_label_23 label_24
74 label_8 label_9 label_0 - label_9 label_0 label_1 - label_9 label_0 - label_7 label_8 label_9 - label_8 label_9 39 74_label_8 label_9 label_0_label_9 label_0 label_1_label_9 label_0_label_7 label_8 label_9
75 cat - dog - cats - dogs - drinking 39 75_cat_dog_cats_dogs
76 org org - loc loc - org - misc - loc 38 76_org org_loc loc_org_misc
77 airplane - deer - bird - ship - frog 38 77_airplane_deer_bird_ship
78 label_32 label_33 label_34 - label_33 label_34 - label_31 label_32 label_33 - label_32 label_33 - label_30 label_31 label_32 38 78_label_32 label_33 label_34_label_33 label_34_label_31 label_32 label_33_label_32 label_33
79 true - - - - 38 79_true___
80 family - sports - music - related - health 38 80_family_sports_music_related
81 star - positive - negative - amazon - negative positive 37 81_star_positive_negative_amazon
82 hospital - unknown - description - material - pad 37 82_hospital_unknown_description_material
83 threat - hate - reward - quality - content 36 83_threat_hate_reward_quality
84 music - speech - instrument - engine - wind 35 84_music_speech_instrument_engine
85 closure - annual - statement - issues - reward 35 85_closure_annual_statement_issues
86 adp - aux - sconj - pron - noun 35 86_adp_aux_sconj_pron
87 experience - location - skill - address - result 35 87_experience_location_skill_address
88 - - - - 34 88____
89 test - train - risk - non - high 34 89_test_train_risk_non
90 samoyed - corgi - husky - golden retriever - golden 34 90_samoyed_corgi_husky_golden retriever
91 unk - zero - 10 - 12 13 14 - 13 14 15 33 91_unk_zero_10_12 13 14
92 non - neutral - ok - lead - 33 92_non_neutral_ok_lead
93 normal - covid - virus - regular - disorder 33 93_normal_covid_virus_regular
94 test - help - app - risk - joke 32 94_test_help_app_risk
95 replace_ - append_ - replace_ replace_ - append_ append_ - replace_ replace_ replace_ 32 95_replace__append__replace_ replace__append_ append_
96 disease - issues - pressure - drug - blood 31 96_disease_issues_pressure_drug
97 women - casual - sexual - individual - use 31 97_women_casual_sexual_individual
98 address - balance - code - second - currency 30 98_address_balance_code_second
99 hate - non - neutral - - 30 99_hate_non_neutral_
100 normal - cell - large - clean - healthy 29 100_normal_cell_large_clean
101 neutral - se - - - 29 101_neutral_se__
102 male - female - hair - skin - men 29 102_male_female_hair_skin
103 title - page - section - abstract - table 28 103_title_page_section_abstract
104 number - gender - case - person - fin 28 104_number_gender_case_person
105 man - bird - flower - long - double 28 105_man_bird_flower_long
106 contradiction - entailment - neutral - - 28 106_contradiction_entailment_neutral_
107 non - - - - 27 107_non___
108 tim - fac - org - pro - loc 27 108_tim_fac_org_pro
109 lincoln - jaguar - visual - audio - sony 27 109_lincoln_jaguar_visual_audio
110 statement - info - check - ad - news 27 110_statement_info_check_ad
111 ben - ext - root - exp - loc 26 111_ben_ext_root_exp
112 yes - - - - 26 112_yes___
113 queen - jack - king - south - war 26 113_queen_jack_king_south
114 - - - - 26 114____
115 ft - cardinal - act - loc - loc loc 25 115_ft_cardinal_act_loc
116 bio - chemical - food - - 25 116_bio_chemical_food_
117 ft - cardinal - act - loc - loc misc org 25 117_ft_cardinal_act_loc
118 metric - task - - - 25 118_metric_task__
119 email - age - patient - zip - organization 25 119_email_age_patient_zip
120 ent - im - ru - mat - art 25 120_ent_im_ru_mat
121 ex - pt - galaxy - moon - 8888 24 121_ex_pt_galaxy_moon
122 - - - - 24 122____
123 neu - sad - dis - joy - 24 123_neu_sad_dis_joy
124 label_122 - label_121 - label_120 - label_123 - label_119 24 124_label_122_label_121_label_120_label_123
125 mixed - positive - negative - neutral positive - neutral 24 125_mixed_positive_negative_neutral positive
126 date event - percent person - quantity - money - percent 24 126_date event_percent person_quantity_money
127 fear joy - sadness surprise - surprise - joy - sadness 24 127_fear joy_sadness surprise_surprise_joy
128 disgust - sadness surprise - joy love - surprise - joy 24 128_disgust_sadness surprise_joy love_surprise
129 magnet - motor - hello - undefined - start 24 129_magnet_motor_hello_undefined
130 loc loc - loc - pers - hi - en 24 130_loc loc_loc_pers_hi
131 event - pers - fac - pro - loc org 24 131_event_pers_fac_pro
132 disorder - body - patient - age - disease 23 132_disorder_body_patient_age
133 happiness - fear - anger disgust - disgust - sadness 23 133_happiness_fear_anger disgust_disgust
134 control - la - social - sin - civil 23 134_control_la_social_sin
135 label_98 label_99 - label_97 label_98 label_99 - label_97 label_98 - label_95 label_96 - label_96 label_97 label_98 23 135_label_98 label_99_label_97 label_98 label_99_label_97 label_98_label_95 label_96
136 greek - chinese - italian - japanese - dutch 23 136_greek_chinese_italian_japanese
137 clean - - - - 23 137_clean___
138 protein - chemical - cell - - 22 138_protein_chemical_cell_
139 treatment - disease - location organization - organization person - organization 22 139_treatment_disease_location organization_organization person
140 institution - tools - org - loc - organization 22 140_institution_tools_org_loc
141 statement - question - - - 22 141_statement_question__
142 period - question - noun - number - 21 142_period_question_noun_number
143 regular - - - - 21 143_regular___
144 rna - - - - 21 144_rna___
145 rs - - - - 21 145_rs___
146 address - id - job - email - country 21 146_address_id_job_email
147 neg - neu - good - - 21 147_neg_neu_good_
148 label_122 label_123 - label_123 - label_122 - label_121 - label_120 20 148_label_122 label_123_label_123_label_122_label_121
149 drink - tea - wine - coffee - soft 20 149_drink_tea_wine_coffee
150 miscellaneous - organization - percent - money - percent person 20 150_miscellaneous_organization_percent_money
151 description - invoice - zip - state - city 20 151_description_invoice_zip_state
152 sports - tech - business - sport - 20 152_sports_tech_business_sport
153 ok - vin - rl - ft - year 20 153_ok_vin_rl_ft
154 healthy - - - - 20 154_healthy___
155 association - event - ticket - disaster - map 20 155_association_event_ticket_disaster
156 10 11 - 10 11 12 - 11 12 - 11 - 11 12 13 19 156_10 11_10 11 12_11 12_11
157 noun num pron - num pron propn - num pron - pron propn punct - pron propn 19 157_noun num pron_num pron propn_num pron_pron propn punct
158 cell - organ - organism - multi - tissue 18 158_cell_organ_organism_multi
159 02 - ent - express - act - delete 18 159_02_ent_express_act
160 sym verb adj - verb adj adp - intj noun num - det intj noun - adj adp adv 18 160_sym verb adj_verb adj adp_intj noun num_det intj noun
161 12 - - - - 18 161_12___
162 org org - org - drug - - 18 162_org org_org_drug_
163 short - long - sl - ac - pad 18 163_short_long_sl_ac
164 plastic - paper - glass - metal - sheet 18 164_plastic_paper_glass_metal
165 ii - blank - iii - vi - et 17 165_ii_blank_iii_vi
166 normal - virus - desert - smoke - pressure 17 166_normal_virus_desert_smoke
167 skill - skills - - - 17 167_skill_skills__
168 protein - rna - cell - line - type 17 168_protein_rna_cell_line
169 korean - russian - dutch - french - thai 17 169_korean_russian_dutch_french
170 rainbow - rain - snow - color - green 17 170_rainbow_rain_snow_color
171 company - role - institution - skill - loc org 16 171_company_role_institution_skill
172 exp - pp - intj - punc - prep 16 172_exp_pp_intj_punc
173 key - menu - - - 16 173_key_menu__
174 adult - young - child - male - female 16 174_adult_young_child_male
175 normal - - - - 16 175_normal___
176 mask - bright - sharp - head - normal 16 176_mask_bright_sharp_head
177 anger disgust fear - anger disgust - disgust fear - disgust - surprise anger 16 177_anger disgust fear_anger disgust_disgust fear_disgust
178 - - - - 16 178____
179 objective - non - neutral - - 16 179_objective_non_neutral_
180 cr - sd - db - - 16 180_cr_sd_db_
181 - - - - 16 181____
182 label_29 label_3 label_30 - label_27 label_28 label_29 - label_26 label_27 label_28 - label_28 label_29 label_3 - label_29 label_3 16 182_label_29 label_3 label_30_label_27 label_28 label_29_label_26 label_27 label_28_label_28 label_29 label_3
183 test - - - - 15 183_test___
184 good - bad - non - - 15 184_good_bad_non_
185 local - por - art - da - em 15 185_local_por_art_da
186 label_122 label_123 - label_97 label_98 label_99 - label_97 label_98 - label_96 label_97 label_98 - label_98 label_99 15 186_label_122 label_123_label_97 label_98 label_99_label_97 label_98_label_96 label_97 label_98
187 prod - loc - evt - misc - org org 15 187_prod_loc_evt_misc
188 invoice - email - form - letter - report 15 188_invoice_email_form_letter
189 end - head - cross - - 15 189_end_head_cross_
190 target - instrument - opinion - question - price 15 190_target_instrument_opinion_question
191 unrelated - support - - - 15 191_unrelated_support__
192 ru - pl - bg - en - es 14 192_ru_pl_bg_en
193 road - good - bike - - 14 193_road_good_bike_
194 human - organism - plants - - 14 194_human_organism_plants_
195 label_7 label_8 label_9 - label_8 label_9 - label_0 label_1 label_10 - label_1 label_10 - label_10 14 195_label_7 label_8 label_9_label_8 label_9_label_0 label_1 label_10_label_1 label_10
196 replace_ - append_ - replace_ replace_ - append_ append_ - replace_ replace_ replace_ 13 196_replace__append__replace_ replace__append_ append_
197 brand - company - tm - color - item 13 197_brand_company_tm_color
198 pro - neutral - russian - support - attack 13 198_pro_neutral_russian_support
199 18 19 20 - 19 20 - 23 - 17 18 19 - 21 13 199_18 19 20_19 20_23_17 18 19
200 crime - pers - time - book - day 13 200_crime_pers_time_book
201 neutral - positive - negative - positive negative - neutral positive 13 201_neutral_positive_negative_positive negative
202 - - - - 13 202____
203 chemical - disease - bio - - 13 203_chemical_disease_bio_
204 angry - happy - sad - neutral - 60 12 204_angry_happy_sad_neutral
205 organisation - task - country - location - product 12 205_organisation_task_country_location
206 iv - iii - vi - ii - unknown 12 206_iv_iii_vi_ii
207 neutral - risk - - - 12 207_neutral_risk__
208 container - id - type - person - number 12 208_container_id_type_person
209 target - - - - 12 209_target___
210 pop - metal - country - song - rock 12 210_pop_metal_country_song
211 email - os - language - method - function 12 211_email_os_language_method
212 contradiction - non - entailment - - 12 212_contradiction_non_entailment_
213 background - objective - method - result - 12 213_background_objective_method_result
214 convertible - cab - type - series - martin 12 214_convertible_cab_type_series
215 public - smoking - drinking - ambiguous - non 12 215_public_smoking_drinking_ambiguous
216 rust - - - - 12 216_rust___
217 persian - mr - man - flying - ghost 12 217_persian_mr_man_flying
218 quote - yes - middle - request - 12 218_quote_yes_middle_request
219 text - mixed - - - 12 219_text_mixed__
220 punc - prep - digit - latin - conj 12 220_punc_prep_digit_latin
221 panda - air - mr - ticket - little 12 221_panda_air_mr_ticket
222 - - - - 12 222____
223 sym verb adj - intj noun num - verb adj adp - cconj det intj - aux cconj det 12 223_sym verb adj_intj noun num_verb adj adp_cconj det intj
224 healthy - tomato - plant - pepper - spot 11 224_healthy_tomato_plant_pepper
225 sony - lg - tv - galaxy - monitor 11 225_sony_lg_tv_galaxy
226 new - city - mid - location - south 11 226_new_city_mid_location
227 space - - - - 11 227_space___
228 cloud - racing - motorcycle - boy - bus 11 228_cloud_racing_motorcycle_boy
229 punc - zero - pers - neg - reflex 11 229_punc_zero_pers_neg
230 energy - arts - high - systems - computer 11 230_energy_arts_high_systems
231 dis - ad - media - site - plant 11 231_dis_ad_media_site
232 world - tech - business - sports - female 11 232_world_tech_business_sports
233 sadness - anger - anger fear - joy - fear 10 233_sadness_anger_anger fear_joy
234 neg - adj - sym - propn - num 10 234_neg_adj_sym_propn
235 bulldog - cat - husky - pug - corgi 9 235_bulldog_cat_husky_pug
236 - - - - 8 236____
237 origin - quote - actor - opinion - language 7 237_origin_quote_actor_opinion
238 na - nb - nc - neu - ng 7 238_na_nb_nc_neu
239 - - - - 7 239____
240 ci - aa - joy - im - ip 7 240_ci_aa_joy_im
241 - - - - 6 241____
242 skill - email - address - grade - language 6 242_skill_email_address_grade
243 sexual - threat - christian - hate - male 6 243_sexual_threat_christian_hate
244 transmission - wind - tower - pole - 6 244_transmission_wind_tower_pole
245 label_14 label_15 - label_13 label_14 label_15 - label_15 - label_12 label_13 label_14 - label_11 label_12 label_13 6 245_label_14 label_15_label_13 label_14 label_15_label_15_label_12 label_13 label_14

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.