Add BERTopic model
Browse files- README.md +322 -0
- config.json +15 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
@@ -0,0 +1,322 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# xsum_6789_50000_25000_v1_train
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("KingKazma/xsum_6789_50000_25000_v1_train")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 255
|
34 |
+
* Number of training documents: 50000
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| -1 | said - mr - police - people - would | 5 | -1_said_mr_police_people |
|
42 |
+
| 0 | league - goal - win - game - foul | 24458 | 0_league_goal_win_game |
|
43 |
+
| 1 | labour - eu - party - vote - referendum | 7343 | 1_labour_eu_party_vote |
|
44 |
+
| 2 | olympic - athlete - race - sport - gold | 1358 | 2_olympic_athlete_race_sport |
|
45 |
+
| 3 | cricket - wicket - england - test - batsman | 1144 | 3_cricket_wicket_england_test |
|
46 |
+
| 4 | school - education - teacher - pupil - student | 781 | 4_school_education_teacher_pupil |
|
47 |
+
| 5 | rail - transport - rmt - train - bridge | 482 | 5_rail_transport_rmt_train |
|
48 |
+
| 6 | nhs - care - health - patient - hospital | 477 | 6_nhs_care_health_patient |
|
49 |
+
| 7 | boko - haram - president - african - africa | 471 | 7_boko_haram_president_african |
|
50 |
+
| 8 | syrian - syria - assad - rebel - iraq | 448 | 8_syrian_syria_assad_rebel |
|
51 |
+
| 9 | fire - blaze - smoke - firefighter - rescue | 345 | 9_fire_blaze_smoke_firefighter |
|
52 |
+
| 10 | murray - wimbledon - tennis - slam - seed | 290 | 10_murray_wimbledon_tennis_slam |
|
53 |
+
| 11 | film - actor - star - movie - award | 266 | 11_film_actor_star_movie |
|
54 |
+
| 12 | dup - sinn - fin - ireland - northern | 258 | 12_dup_sinn_fin_ireland |
|
55 |
+
| 13 | fight - boxing - champion - title - fury | 257 | 13_fight_boxing_champion_title |
|
56 |
+
| 14 | crash - road - collision - driver - car | 247 | 14_crash_road_collision_driver |
|
57 |
+
| 15 | mercedes - hamilton - f1 - race - rosberg | 238 | 15_mercedes_hamilton_f1_race |
|
58 |
+
| 16 | coastguard - lifeboat - rescue - boat - rnli | 235 | 16_coastguard_lifeboat_rescue_boat |
|
59 |
+
| 17 | china - chinese - hong - kong - chinas | 230 | 17_china_chinese_hong_kong |
|
60 |
+
| 18 | ukraine - russian - russia - ukrainian - putin | 221 | 18_ukraine_russian_russia_ukrainian |
|
61 |
+
| 19 | taliban - pakistan - afghan - pakistani - afghanistan | 218 | 19_taliban_pakistan_afghan_pakistani |
|
62 |
+
| 20 | mcilroy - golf - birdie - open - par | 218 | 20_mcilroy_golf_birdie_open |
|
63 |
+
| 21 | dog - animal - dogs - cat - rspca | 208 | 21_dog_animal_dogs_cat |
|
64 |
+
| 22 | data - security - nsa - computer - malware | 207 | 22_data_security_nsa_computer |
|
65 |
+
| 23 | cancer - patient - treatment - disease - cell | 198 | 23_cancer_patient_treatment_disease |
|
66 |
+
| 24 | maduro - venezuela - mexico - morales - president | 198 | 24_maduro_venezuela_mexico_morales |
|
67 |
+
| 25 | energy - climate - gas - wind - carbon | 197 | 25_energy_climate_gas_wind |
|
68 |
+
| 26 | sexual - indecent - court - assault - victim | 182 | 26_sexual_indecent_court_assault |
|
69 |
+
| 27 | sale - store - retail - tesco - retailer | 174 | 27_sale_store_retail_tesco |
|
70 |
+
| 28 | marriage - church - bishop - samesex - cardinal | 172 | 28_marriage_church_bishop_samesex |
|
71 |
+
| 29 | album - song - music - band - chart | 170 | 29_album_song_music_band |
|
72 |
+
| 30 | apple - samsung - phone - technology - mobile | 163 | 30_apple_samsung_phone_technology |
|
73 |
+
| 31 | trump - republican - clinton - republicans - mr | 156 | 31_trump_republican_clinton_republicans |
|
74 |
+
| 32 | yn - ar - wedi - ei - mae | 148 | 32_yn_ar_wedi_ei |
|
75 |
+
| 33 | ebola - virus - vaccine - outbreak - zika | 148 | 33_ebola_virus_vaccine_outbreak |
|
76 |
+
| 34 | planet - particle - earth - space - universe | 143 | 34_planet_particle_earth_space |
|
77 |
+
| 35 | flood - flooding - water - weather - rain | 139 | 35_flood_flooding_water_weather |
|
78 |
+
| 36 | migrant - refugee - asylum - hungary - migrants | 138 | 36_migrant_refugee_asylum_hungary |
|
79 |
+
| 37 | korea - north - korean - kim - missile | 130 | 37_korea_north_korean_kim |
|
80 |
+
| 38 | paris - french - attack - france - police | 122 | 38_paris_french_attack_france |
|
81 |
+
| 39 | memorial - war - battle - soldier - regiment | 120 | 39_memorial_war_battle_soldier |
|
82 |
+
| 40 | plane - flight - aircraft - pilot - airport | 112 | 40_plane_flight_aircraft_pilot |
|
83 |
+
| 41 | man - det - incident - wearing - police | 111 | 41_man_det_incident_wearing |
|
84 |
+
| 42 | art - painting - artist - exhibition - gallery | 109 | 42_art_painting_artist_exhibition |
|
85 |
+
| 43 | growth - rate - economy - inflation - bank | 106 | 43_growth_rate_economy_inflation |
|
86 |
+
| 44 | prison - prisoner - prisons - offender - prisoners | 104 | 44_prison_prisoner_prisons_offender |
|
87 |
+
| 45 | bank - banking - barclays - hsbc - rbs | 103 | 45_bank_banking_barclays_hsbc |
|
88 |
+
| 46 | earthquake - quake - nepal - magnitude - hurricane | 101 | 46_earthquake_quake_nepal_magnitude |
|
89 |
+
| 47 | shooting - police - officer - black - gun | 100 | 47_shooting_police_officer_black |
|
90 |
+
| 48 | greece - greek - eurozone - bailout - greeces | 99 | 48_greece_greek_eurozone_bailout |
|
91 |
+
| 49 | housing - price - property - house - home | 94 | 49_housing_price_property_house |
|
92 |
+
| 50 | morsi - egypt - brotherhood - egyptian - cairo | 93 | 50_morsi_egypt_brotherhood_egyptian |
|
93 |
+
| 51 | airport - heathrow - runway - flight - gatwick | 92 | 51_airport_heathrow_runway_flight |
|
94 |
+
| 52 | murder - arrested - suspicion - custody - postmortem | 92 | 52_murder_arrested_suspicion_custody |
|
95 |
+
| 53 | zoo - tiger - animal - elephant - rhino | 90 | 53_zoo_tiger_animal_elephant |
|
96 |
+
| 54 | festival - event - music - edinburgh - organiser | 90 | 54_festival_event_music_edinburgh |
|
97 |
+
| 55 | book - novel - author - prize - writer | 89 | 55_book_novel_author_prize |
|
98 |
+
| 56 | snooker - osullivan - frame - world - gerwen | 88 | 56_snooker_osullivan_frame_world |
|
99 |
+
| 57 | unsupported - updated - playback - device - media | 87 | 57_unsupported_updated_playback_device |
|
100 |
+
| 58 | india - indias - delhi - indian - woman | 86 | 58_india_indias_delhi_indian |
|
101 |
+
| 59 | trust - death - care - hospital - baby | 86 | 59_trust_death_care_hospital |
|
102 |
+
| 60 | bbc - licence - s4c - fee - wales | 84 | 60_bbc_licence_s4c_fee |
|
103 |
+
| 61 | prince - queen - royal - duchess - duke | 78 | 61_prince_queen_royal_duchess |
|
104 |
+
| 62 | index - benchmark - nikkei - chinas - growth | 78 | 62_index_benchmark_nikkei_chinas |
|
105 |
+
| 63 | abuse - police - child - sexual - exploitation | 77 | 63_abuse_police_child_sexual |
|
106 |
+
| 64 | belfast - ira - finucane - murder - family | 76 | 64_belfast_ira_finucane_murder |
|
107 |
+
| 65 | council - site - development - building - regeneration | 74 | 65_council_site_development_building |
|
108 |
+
| 66 | obesity - sugar - food - obese - drink | 73 | 66_obesity_sugar_food_obese |
|
109 |
+
| 67 | murder - court - heard - knife - trial | 72 | 67_murder_court_heard_knife |
|
110 |
+
| 68 | steel - tata - talbot - plant - port | 71 | 68_steel_tata_talbot_plant |
|
111 |
+
| 69 | bird - wildlife - birds - rspb - conservation | 67 | 69_bird_wildlife_birds_rspb |
|
112 |
+
| 70 | drug - cannabis - heroin - drugs - marijuana | 64 | 70_drug_cannabis_heroin_drugs |
|
113 |
+
| 71 | pen - macron - fillon - le - french | 61 | 71_pen_macron_fillon_le |
|
114 |
+
| 72 | sp - nasdaq - dow - rose - index | 61 | 72_sp_nasdaq_dow_rose |
|
115 |
+
| 73 | israel - israeli - palestinian - palestinians - hamas | 59 | 73_israel_israeli_palestinian_palestinians |
|
116 |
+
| 74 | ftse - shares - share - pound - index | 59 | 74_ftse_shares_share_pound |
|
117 |
+
| 75 | updated - gmt - 2017 - bst - last | 57 | 75_updated_gmt_2017_bst |
|
118 |
+
| 76 | broadband - bt - ofcom - openreach - customer | 57 | 76_broadband_bt_ofcom_openreach |
|
119 |
+
| 77 | vw - emission - car - volkswagen - diesel | 57 | 77_vw_emission_car_volkswagen |
|
120 |
+
| 78 | iran - nuclear - irans - iranian - rouhani | 56 | 78_iran_nuclear_irans_iranian |
|
121 |
+
| 79 | cushnahan - nama - ireland - northern - ni | 55 | 79_cushnahan_nama_ireland_northern |
|
122 |
+
| 80 | alcohol - drinking - drink - wine - minimum | 53 | 80_alcohol_drinking_drink_wine |
|
123 |
+
| 81 | fraud - money - court - judge - account | 53 | 81_fraud_money_court_judge |
|
124 |
+
| 82 | pope - vatican - francis - church - catholic | 53 | 82_pope_vatican_francis_church |
|
125 |
+
| 83 | pollution - air - emission - nitrogen - no2 | 52 | 83_pollution_air_emission_nitrogen |
|
126 |
+
| 84 | pokemon - game - console - nintendo - vr | 52 | 84_pokemon_game_console_nintendo |
|
127 |
+
| 85 | driver - road - camera - driving - speed | 52 | 85_driver_road_camera_driving |
|
128 |
+
| 86 | waste - recycling - bag - plastic - food | 52 | 86_waste_recycling_bag_plastic |
|
129 |
+
| 87 | farc - peace - eln - rebel - colombian | 50 | 87_farc_peace_eln_rebel |
|
130 |
+
| 88 | berlusconi - pp - rajoy - spains - catalan | 50 | 88_berlusconi_pp_rajoy_spains |
|
131 |
+
| 89 | thailand - thai - king - yingluck - thailands | 49 | 89_thailand_thai_king_yingluck |
|
132 |
+
| 90 | quantum - computer - machine - computing - ai | 46 | 90_quantum_computer_machine_computing |
|
133 |
+
| 91 | kosovo - bosnian - serbia - serb - srebrenica | 45 | 91_kosovo_bosnian_serbia_serb |
|
134 |
+
| 92 | drug - cannabis - cocaine - drugs - court | 45 | 92_drug_cannabis_cocaine_drugs |
|
135 |
+
| 93 | rousseff - petrobras - temer - brazils - corruption | 45 | 93_rousseff_petrobras_temer_brazils |
|
136 |
+
| 94 | yemen - houthis - hadi - houthi - saudi | 44 | 94_yemen_houthis_hadi_houthi |
|
137 |
+
| 95 | tax - budget - chancellor - cut - spending | 44 | 95_tax_budget_chancellor_cut |
|
138 |
+
| 96 | train - tram - driver - raib - rail | 44 | 96_train_tram_driver_raib |
|
139 |
+
| 97 | fbi - comey - trump - clinton - email | 44 | 97_fbi_comey_trump_clinton |
|
140 |
+
| 98 | drone - aircraft - drones - aviation - unmanned | 43 | 98_drone_aircraft_drones_aviation |
|
141 |
+
| 99 | smoking - tobacco - cigarette - ecigarettes - smoker | 42 | 99_smoking_tobacco_cigarette_ecigarettes |
|
142 |
+
| 100 | hillsborough - disaster - liverpool - 1989 - crush | 41 | 100_hillsborough_disaster_liverpool_1989 |
|
143 |
+
| 101 | council - local - cut - budget - tax | 40 | 101_council_local_cut_budget |
|
144 |
+
| 102 | google - facebook - user - video - search | 40 | 102_google_facebook_user_video |
|
145 |
+
| 103 | syria - islamic - family - son - iraq | 39 | 103_syria_islamic_family_son |
|
146 |
+
| 104 | missing - search - body - police - seen | 38 | 104_missing_search_body_police |
|
147 |
+
| 105 | airline - airbus - airlines - aer - boeing | 38 | 105_airline_airbus_airlines_aer |
|
148 |
+
| 106 | car - psa - vehicle - gm - battery | 36 | 106_car_psa_vehicle_gm |
|
149 |
+
| 107 | fish - salmon - fishing - water - fishery | 36 | 107_fish_salmon_fishing_water |
|
150 |
+
| 108 | oil - gas - decommissioning - field - sea | 36 | 108_oil_gas_decommissioning_field |
|
151 |
+
| 109 | policing - police - constable - officer - spa | 35 | 109_policing_police_constable_officer |
|
152 |
+
| 110 | fire - cladding - grenfell - tower - block | 34 | 110_fire_cladding_grenfell_tower |
|
153 |
+
| 111 | nuclear - reactor - fukushima - plant - radiation | 33 | 111_nuclear_reactor_fukushima_plant |
|
154 |
+
| 112 | tree - woodland - trees - oak - forest | 32 | 112_tree_woodland_trees_oak |
|
155 |
+
| 113 | milk - dairy - farmer - farmers - farming | 32 | 113_milk_dairy_farmer_farmers |
|
156 |
+
| 114 | abortion - woman - termination - ireland - northern | 32 | 114_abortion_woman_termination_ireland |
|
157 |
+
| 115 | whale - dolphin - whales - sperm - orca | 32 | 115_whale_dolphin_whales_sperm |
|
158 |
+
| 116 | nauru - australia - asylum - australian - seeker | 31 | 116_nauru_australia_asylum_australian |
|
159 |
+
| 117 | driving - clarke - car - causing - crash | 31 | 117_driving_clarke_car_causing |
|
160 |
+
| 118 | stolen - police - bike - haldane - robbery | 31 | 118_stolen_police_bike_haldane |
|
161 |
+
| 119 | meat - horsemeat - milk - food - product | 31 | 119_meat_horsemeat_milk_food |
|
162 |
+
| 120 | wage - living - pay - minimum - worker | 30 | 120_wage_living_pay_minimum |
|
163 |
+
| 121 | belfast - flag - parade - parades - loyalist | 30 | 121_belfast_flag_parade_parades |
|
164 |
+
| 122 | terrorism - arrested - arrest - suspicion - police | 29 | 122_terrorism_arrested_arrest_suspicion |
|
165 |
+
| 123 | manchester - protest - ford - police - london | 29 | 123_manchester_protest_ford_police |
|
166 |
+
| 124 | uber - driver - taxi - ubers - kalanick | 28 | 124_uber_driver_taxi_ubers |
|
167 |
+
| 125 | calais - camp - migrant - jungle - asylum | 28 | 125_calais_camp_migrant_jungle |
|
168 |
+
| 126 | music - streaming - spotify - album - artist | 28 | 126_music_streaming_spotify_album |
|
169 |
+
| 127 | childrens - ofsted - child - council - improvement | 28 | 127_childrens_ofsted_child_council |
|
170 |
+
| 128 | erdogan - turkish - turkey - coup - istanbul | 28 | 128_erdogan_turkish_turkey_coup |
|
171 |
+
| 129 | cuba - cuban - castro - cubans - havana | 28 | 129_cuba_cuban_castro_cubans |
|
172 |
+
| 130 | libya - gaddafi - libyan - tripoli - gaddafis | 27 | 130_libya_gaddafi_libyan_tripoli |
|
173 |
+
| 131 | oil - barrel - opec - price - saudi | 26 | 131_oil_barrel_opec_price |
|
174 |
+
| 132 | trident - nuclear - submarine - renewal - defence | 26 | 132_trident_nuclear_submarine_renewal |
|
175 |
+
| 133 | pistorius - steenkamp - reeva - toilet - intruder | 26 | 133_pistorius_steenkamp_reeva_toilet |
|
176 |
+
| 134 | transgender - gay - marriage - law - samesex | 26 | 134_transgender_gay_marriage_law |
|
177 |
+
| 135 | space - astronaut - peake - tim - iss | 25 | 135_space_astronaut_peake_tim |
|
178 |
+
| 136 | pte - inquest - lcpl - cpl - soldier | 25 | 136_pte_inquest_lcpl_cpl |
|
179 |
+
| 137 | cox - jo - mp - batley - mrs | 24 | 137_cox_jo_mp_batley |
|
180 |
+
| 138 | jackpot - lottery - ticket - camelot - prize | 24 | 138_jackpot_lottery_ticket_camelot |
|
181 |
+
| 139 | pottery - roman - excavation - stone - site | 24 | 139_pottery_roman_excavation_stone |
|
182 |
+
| 140 | wikipedia - woman - women - makeup - female | 24 | 140_wikipedia_woman_women_makeup |
|
183 |
+
| 141 | energy - price - supplier - customer - gas | 23 | 141_energy_price_supplier_customer |
|
184 |
+
| 142 | dinosaur - specimen - fossil - neanderthals - museum | 23 | 142_dinosaur_specimen_fossil_neanderthals |
|
185 |
+
| 143 | yamaha - rossi - marquez - lorenzo - ducati | 23 | 143_yamaha_rossi_marquez_lorenzo |
|
186 |
+
| 144 | execution - death - drug - lethal - executions | 22 | 144_execution_death_drug_lethal |
|
187 |
+
| 145 | tesla - car - selfdriving - vehicle - autonomous | 22 | 145_tesla_car_selfdriving_vehicle |
|
188 |
+
| 146 | famine - drought - somalia - food - aid | 22 | 146_famine_drought_somalia_food |
|
189 |
+
| 147 | inquiry - abuse - survivor - goddard - inquirys | 21 | 147_inquiry_abuse_survivor_goddard |
|
190 |
+
| 148 | mh370 - plane - search - flight - ocean | 21 | 148_mh370_plane_search_flight |
|
191 |
+
| 149 | coin - museum - hoard - treasure - ring | 21 | 149_coin_museum_hoard_treasure |
|
192 |
+
| 150 | assange - wikileaks - extradition - embassy - assanges | 21 | 150_assange_wikileaks_extradition_embassy |
|
193 |
+
| 151 | ride - alton - smiler - towers - merlin | 20 | 151_ride_alton_smiler_towers |
|
194 |
+
| 152 | fm - radio - tv - freedom - medium | 19 | 152_fm_radio_tv_freedom |
|
195 |
+
| 153 | pension - annuity - retirement - income - pensions | 19 | 153_pension_annuity_retirement_income |
|
196 |
+
| 154 | homelessness - homeless - housing - rough - council | 19 | 154_homelessness_homeless_housing_rough |
|
197 |
+
| 155 | facebook - news - fake - medium - social | 19 | 155_facebook_news_fake_medium |
|
198 |
+
| 156 | trade - tpp - nafta - us - mexico | 19 | 156_trade_tpp_nafta_us |
|
199 |
+
| 157 | whisky - distillery - beer - scotch - bottle | 19 | 157_whisky_distillery_beer_scotch |
|
200 |
+
| 158 | court - trigg - heard - ms - eli | 19 | 158_court_trigg_heard_ms |
|
201 |
+
| 159 | nba - curry - lebron - warriors - cleveland | 19 | 159_nba_curry_lebron_warriors |
|
202 |
+
| 160 | ferry - calmac - serco - ferries - contract | 18 | 160_ferry_calmac_serco_ferries |
|
203 |
+
| 161 | hms - ship - navy - shipbuilding - warship | 18 | 161_hms_ship_navy_shipbuilding |
|
204 |
+
| 162 | syria - strike - iraq - mps - military | 18 | 162_syria_strike_iraq_mps |
|
205 |
+
| 163 | childcare - child - parent - inheritance - meal | 18 | 163_childcare_child_parent_inheritance |
|
206 |
+
| 164 | junior - doctor - bma - contract - doctors | 18 | 164_junior_doctor_bma_contract |
|
207 |
+
| 165 | 1916 - rising - irish - easter - ireland | 18 | 165_1916_rising_irish_easter |
|
208 |
+
| 166 | condor - guernsey - ship - poole - port | 17 | 166_condor_guernsey_ship_poole |
|
209 |
+
| 167 | hussain - terrorism - terrorist - heard - court | 17 | 167_hussain_terrorism_terrorist_heard |
|
210 |
+
| 168 | unemployment - ons - rate - employment - growth | 17 | 168_unemployment_ons_rate_employment |
|
211 |
+
| 169 | suu - kyi - nld - aung - thein | 16 | 169_suu_kyi_nld_aung |
|
212 |
+
| 170 | eurotunnel - calais - french - eurostar - train | 16 | 170_eurotunnel_calais_french_eurostar |
|
213 |
+
| 171 | bike - cycling - cycle - cyclist - parking | 15 | 171_bike_cycling_cycle_cyclist |
|
214 |
+
| 172 | breath - driving - drinkdriving - limit - driver | 15 | 172_breath_driving_drinkdriving_limit |
|
215 |
+
| 173 | everest - avalanche - mountain - sherpa - icefall | 15 | 173_everest_avalanche_mountain_sherpa |
|
216 |
+
| 174 | reef - coral - vent - seabed - marine | 15 | 174_reef_coral_vent_seabed |
|
217 |
+
| 175 | army - defence - mod - reserve - recruitment | 15 | 175_army_defence_mod_reserve |
|
218 |
+
| 176 | explosion - tianjin - bomb - blast - bethnal | 15 | 176_explosion_tianjin_bomb_blast |
|
219 |
+
| 177 | mayor - devolution - combined - greater - region | 15 | 177_mayor_devolution_combined_greater |
|
220 |
+
| 178 | tax - company - uk - cayman - profit | 15 | 178_tax_company_uk_cayman |
|
221 |
+
| 179 | muslims - ban - muslim - us - order | 15 | 179_muslims_ban_muslim_us |
|
222 |
+
| 180 | growth - output - sector - scotlands - scottish | 15 | 180_growth_output_sector_scotlands |
|
223 |
+
| 181 | suicide - acne - judith - life - mental | 14 | 181_suicide_acne_judith_life |
|
224 |
+
| 182 | bp - spill - oil - rig - deepwater | 14 | 182_bp_spill_oil_rig |
|
225 |
+
| 183 | xinjiang - uighur - uighurs - urumqi - chinese | 14 | 183_xinjiang_uighur_uighurs_urumqi |
|
226 |
+
| 184 | refugee - syrians - syria - syrian - refugees | 14 | 184_refugee_syrians_syria_syrian |
|
227 |
+
| 185 | rea - sykes - davies - fish - race | 14 | 185_rea_sykes_davies_fish |
|
228 |
+
| 186 | mortgage - lending - debt - insolvency - lender | 13 | 186_mortgage_lending_debt_insolvency |
|
229 |
+
| 187 | barnes - pilot - helicopter - crash - fog | 13 | 187_barnes_pilot_helicopter_crash |
|
230 |
+
| 188 | rhodes - statue - igbo - college - oriel | 13 | 188_rhodes_statue_igbo_college |
|
231 |
+
| 189 | edf - hinkley - nuclear - plant - reactor | 13 | 189_edf_hinkley_nuclear_plant |
|
232 |
+
| 190 | sweeney - church - leonard - alder - megans | 13 | 190_sweeney_church_leonard_alder |
|
233 |
+
| 191 | duterte - philippines - mindanao - dutertes - martial | 13 | 191_duterte_philippines_mindanao_dutertes |
|
234 |
+
| 192 | ferry - ship - yoo - sank - sewol | 13 | 192_ferry_ship_yoo_sank |
|
235 |
+
| 193 | norovirus - diarrhoea - hospital - virus - patient | 13 | 193_norovirus_diarrhoea_hospital_virus |
|
236 |
+
| 194 | art - arts - culture - theatre - funding | 13 | 194_art_arts_culture_theatre |
|
237 |
+
| 195 | pipeline - dakota - oil - sioux - project | 13 | 195_pipeline_dakota_oil_sioux |
|
238 |
+
| 196 | climate - temperature - warming - global - ocean | 13 | 196_climate_temperature_warming_global |
|
239 |
+
| 197 | leg - solar - piccard - impulse - borschberg | 12 | 197_leg_solar_piccard_impulse |
|
240 |
+
| 198 | gun - zimmerman - roof - fbi - shooting | 12 | 198_gun_zimmerman_roof_fbi |
|
241 |
+
| 199 | copyright - infringement - megaupload - pirated - piracy | 12 | 199_copyright_infringement_megaupload_pirated |
|
242 |
+
| 200 | bee - hive - beekeeper - honey - tunibee | 12 | 200_bee_hive_beekeeper_honey |
|
243 |
+
| 201 | bombardier - cseries - belfast - bombardiers - learjet | 11 | 201_bombardier_cseries_belfast_bombardiers |
|
244 |
+
| 202 | trudeau - canada - canadian - harper - prentice | 11 | 202_trudeau_canada_canadian_harper |
|
245 |
+
| 203 | object - reopened - evacuated - bomb - street | 11 | 203_object_reopened_evacuated_bomb |
|
246 |
+
| 204 | autism - mental - child - health - autistic | 11 | 204_autism_mental_child_health |
|
247 |
+
| 205 | regiment - lcpl - helmand - afghanistan - soldier | 11 | 205_regiment_lcpl_helmand_afghanistan |
|
248 |
+
| 206 | tunisia - attack - sousse - hotel - essid | 11 | 206_tunisia_attack_sousse_hotel |
|
249 |
+
| 207 | press - leveson - foi - ipso - newspaper | 11 | 207_press_leveson_foi_ipso |
|
250 |
+
| 208 | raf - aircraft - base - mildenhall - squadron | 11 | 208_raf_aircraft_base_mildenhall |
|
251 |
+
| 209 | language - welsh - literature - huws - meri | 11 | 209_language_welsh_literature_huws |
|
252 |
+
| 210 | concert - manchester - grande - ariana - arena | 11 | 210_concert_manchester_grande_ariana |
|
253 |
+
| 211 | lubitz - cockpit - lufthansa - copilot - germanwings | 11 | 211_lubitz_cockpit_lufthansa_copilot |
|
254 |
+
| 212 | facebook - tweet - gamergate - content - user | 10 | 212_facebook_tweet_gamergate_content |
|
255 |
+
| 213 | mine - miner - underground - fyfield - mining | 10 | 213_mine_miner_underground_fyfield |
|
256 |
+
| 214 | ira - sinn - fin - cahill - ireland | 10 | 214_ira_sinn_fin_cahill |
|
257 |
+
| 215 | gear - clarkson - hammond - show - clarksons | 10 | 215_gear_clarkson_hammond_show |
|
258 |
+
| 216 | tree - trees - felling - sheffield - diseased | 10 | 216_tree_trees_felling_sheffield |
|
259 |
+
| 217 | forbes - richest - billionaire - list - billionaires | 9 | 217_forbes_richest_billionaire_list |
|
260 |
+
| 218 | pier - structure - bewl - birnbeck - restore | 9 | 218_pier_structure_bewl_birnbeck |
|
261 |
+
| 219 | bbcscotlandpics - scotlandpicturesbbccouk - picture - selection - instagram | 9 | 219_bbcscotlandpics_scotlandpicturesbbccouk_picture_selection |
|
262 |
+
| 220 | chemical - tianjin - blast - cyanide - sodium | 9 | 220_chemical_tianjin_blast_cyanide |
|
263 |
+
| 221 | lever - ranganathan - gray - spinal - mire | 9 | 221_lever_ranganathan_gray_spinal |
|
264 |
+
| 222 | internet - icann - cac - user - china | 9 | 222_internet_icann_cac_user |
|
265 |
+
| 223 | chandelier - museum - bute - museums - abmu | 8 | 223_chandelier_museum_bute_museums |
|
266 |
+
| 224 | poultry - bird - flu - outbreak - avian | 8 | 224_poultry_bird_flu_outbreak |
|
267 |
+
| 225 | school - parent - thot - dress - circus | 8 | 225_school_parent_thot_dress |
|
268 |
+
| 226 | gambling - casino - machine - betting - machines | 8 | 226_gambling_casino_machine_betting |
|
269 |
+
| 227 | ticket - venue - theatre - ticketing - tickets | 8 | 227_ticket_venue_theatre_ticketing |
|
270 |
+
| 228 | cardiff - solstice - arriva - train - station | 8 | 228_cardiff_solstice_arriva_train |
|
271 |
+
| 229 | hacking - brooks - editor - sun - news | 7 | 229_hacking_brooks_editor_sun |
|
272 |
+
| 230 | sats - gnome - 11 - santa - cam | 7 | 230_sats_gnome_11_santa |
|
273 |
+
| 231 | robot - biomimicry - benyus - robots - robotics | 7 | 231_robot_biomimicry_benyus_robots |
|
274 |
+
| 232 | parkrun - parking - park - laugharne - charge | 7 | 232_parkrun_parking_park_laugharne |
|
275 |
+
| 233 | organ - transplant - donor - donation - optout | 7 | 233_organ_transplant_donor_donation |
|
276 |
+
| 234 | cav - bowers - ramadhan - aerospace - grills | 7 | 234_cav_bowers_ramadhan_aerospace |
|
277 |
+
| 235 | call - scotland - bilston - police - hmics | 6 | 235_call_scotland_bilston_police |
|
278 |
+
| 236 | sao - water - munduruku - tapajos - paulo | 6 | 236_sao_water_munduruku_tapajos |
|
279 |
+
| 237 | eurovision - song - contest - redzepova - entry | 6 | 237_eurovision_song_contest_redzepova |
|
280 |
+
| 238 | livingstone - antisemitism - labour - mann - comment | 6 | 238_livingstone_antisemitism_labour_mann |
|
281 |
+
| 239 | book - publishing - ebook - asi - digital | 6 | 239_book_publishing_ebook_asi |
|
282 |
+
| 240 | befriending - elaine - frsb - older - fundraising | 6 | 240_befriending_elaine_frsb_older |
|
283 |
+
| 241 | strathaven - tipper - scene - police - humbie | 6 | 241_strathaven_tipper_scene_police |
|
284 |
+
| 242 | bay - cardiff - swansea - region - investment | 5 | 242_bay_cardiff_swansea_region |
|
285 |
+
| 243 | cheese - food - outbreak - coli - flicks | 5 | 243_cheese_food_outbreak_coli |
|
286 |
+
| 244 | witheridge - miller - thai - koh - tao | 5 | 244_witheridge_miller_thai_koh |
|
287 |
+
| 245 | yorkshire - tour - depart - cycling - verity | 5 | 245_yorkshire_tour_depart_cycling |
|
288 |
+
| 246 | airline - lufthansa - franceklm - air - flight | 5 | 246_airline_lufthansa_franceklm_air |
|
289 |
+
| 247 | caffel - honourbased - forensic - warning - gill | 5 | 247_caffel_honourbased_forensic_warning |
|
290 |
+
| 248 | torreele - quebec - bissonnette - polish - boissoneault | 5 | 248_torreele_quebec_bissonnette_polish |
|
291 |
+
| 249 | lash - advert - ad - skin - asa | 5 | 249_lash_advert_ad_skin |
|
292 |
+
| 250 | fgm - girl - practice - subjected - woman | 5 | 250_fgm_girl_practice_subjected |
|
293 |
+
| 251 | parkland - wepre - heritage - margam - arnold | 5 | 251_parkland_wepre_heritage_margam |
|
294 |
+
| 252 | coal - aberfan - colliery - gedling - thoresby | 5 | 252_coal_aberfan_colliery_gedling |
|
295 |
+
| 253 | exoffenders - pupil - gwynne - yemms - school | 5 | 253_exoffenders_pupil_gwynne_yemms |
|
296 |
+
|
297 |
+
</details>
|
298 |
+
|
299 |
+
## Training hyperparameters
|
300 |
+
|
301 |
+
* calculate_probabilities: True
|
302 |
+
* language: english
|
303 |
+
* low_memory: False
|
304 |
+
* min_topic_size: 10
|
305 |
+
* n_gram_range: (1, 1)
|
306 |
+
* nr_topics: None
|
307 |
+
* seed_topic_list: None
|
308 |
+
* top_n_words: 10
|
309 |
+
* verbose: False
|
310 |
+
|
311 |
+
## Framework versions
|
312 |
+
|
313 |
+
* Numpy: 1.23.5
|
314 |
+
* HDBSCAN: 0.8.33
|
315 |
+
* UMAP: 0.5.3
|
316 |
+
* Pandas: 1.5.3
|
317 |
+
* Scikit-Learn: 1.2.2
|
318 |
+
* Sentence-transformers: 2.2.2
|
319 |
+
* Transformers: 4.31.0
|
320 |
+
* Numba: 0.57.1
|
321 |
+
* Plotly: 5.15.0
|
322 |
+
* Python: 3.10.12
|
config.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": true,
|
3 |
+
"language": "english",
|
4 |
+
"low_memory": false,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
1
|
9 |
+
],
|
10 |
+
"nr_topics": null,
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 10,
|
13 |
+
"verbose": false,
|
14 |
+
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
|
15 |
+
}
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:89bc1b535456dfe9191191a374d716d17d5107316019730143d221821ec09fbc
|
3 |
+
size 391768
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|