D0men1c0 commited on
Commit
4b60802
1 Parent(s): e4fce89

Add BERTopic model

Browse files
Files changed (4) hide show
  1. README.md +226 -0
  2. config.json +17 -0
  3. topic_embeddings.safetensors +3 -0
  4. topics.json +0 -0
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # ISSR_Dark_Web_Merged_Models_Content_White_Nations
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("D0men1c0/ISSR_Dark_Web_Merged_Models_Content_White_Nations")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 157
34
+ * Number of training documents: 378835
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | vendor - order - nt - market - link | 340 | Outliers |
42
+ | 0 | cart - weed - strain - thc - bud | 127409 | Product Reviews and Purchases |
43
+ | 1 | deposit - address - ticket - btc - wallet | 19104 | Empire Deposit & Withdrawal Issues |
44
+ | 2 | key - pgp - account - pgp key - password | 6011 | PGP Key Security |
45
+ | 3 | order - shipped - ordered - day - week | 6350 | Order Shipping Status |
46
+ | 4 | scam - scammer - scam scam - scam scam scam - scammed | 3368 | Vendor Scams and Detection |
47
+ | 5 | thanks - thank - lol - man - bro | 4836 | Friendly Positive Talk |
48
+ | 6 | ship - country - eu - shipping - uk | 4584 | Shipping in EU Countries |
49
+ | 7 | coke - cocaine - quality - product - good | 5583 | High Quality Cocaine |
50
+ | 8 | card - carding - cc - gift card - gift | 4161 | Carding Strategies |
51
+ | 9 | pgp - begin pgp - begin - pgp signature - signature | 4192 | PGP Signature End |
52
+ | 10 | lsd - tab - ug - acid - blotter | 2462 | LSD Tab Marketplace Reviews |
53
+ | 11 | vendor - good - anyone - know - legit | 3574 | Vendor Recommendation |
54
+ | 12 | dispute - refund - mod - order - moderator | 5834 | Dispute Resolution |
55
+ | 13 | wsm - dream - market - exit - vendor | 2863 | WSM Exit Scam Warnings |
56
+ | 14 | drug - police - get - nt - house | 2455 | Drugs and Police Enforcement |
57
+ | 15 | monero - xmr - wallet - btc - exchange | 3042 | Monero Wallet Exchange and Bitcoin Use |
58
+ | 16 | ddos - attack - ddos attack - mirror - market | 3928 | Dread Market DDoS Attack |
59
+ | 17 | mdma - mda - price - quality - vendor | 2056 | MDMA Vendor Quality Prices |
60
+ | 18 | darknet - clearnet - dark - link - darknetmarkets | 3244 | Darknet Market |
61
+ | 19 | sub - post - mod - banned - link | 3277 | Dread Market Forum Rules and Bans |
62
+ | 20 | bar - alp - hulk - press - pack | 1942 | Alprazolam Pressed Bars Reviews |
63
+ | 21 | xanax - bar - alp - mg - alprazolam | 2940 | Xanax Bars and Vendors |
64
+ | 22 | market - new market - market market - good - new | 3269 | Market Upgrade Support |
65
+ | 23 | feedback - review - vendor - negative - positive | 2670 | ecommerce feedback |
66
+ | 24 | mirror - working - working mirror - mirror link - link | 1802 | Mirror Link Working |
67
+ | 25 | mg - pill - tablet - price - xtc | 1447 | Drug Sales |
68
+ | 26 | box - mail - package - address - po | 4287 | Mail Delivery Issues |
69
+ | 27 | review - thanks - thanks review - review thanks - nice review | 2829 | Positive Reviews Thank You Notes Nice |
70
+ | 28 | ticket - support ticket - support - se en - se | 2240 | Support Ticket Confusion |
71
+ | 29 | cryptonia - market - empire - nightmare - vendor | 1811 | Cryptonia Market |
72
+ | 30 | escrow - fe - use escrow - vendor - market | 1589 | Market Escrow Usage |
73
+ | 31 | onion - dot onion - dot - onion link - onion site | 1551 | Onion Links |
74
+ | 32 | det - er - og - har - jeg | 1471 | Kola;Vendor;Stealth Shipping;Review;Norway |
75
+ | 33 | tor - browser - network - javascript - tor browser | 1126 | Anonymous Browsing and Tor Networks |
76
+ | 34 | dread - reddit - post - dread dread - sub | 1331 | community appreciation |
77
+ | 35 | meth - business day - business - day - good | 1493 | Meth Vendor Quality Review |
78
+ | 36 | fent - fentanyl - opiate - heroin - nt | 1640 | Fentanyl Opiate Discussion |
79
+ | 37 | link - link link - point link comment - link point link - link comment post | 2294 | Links and Posts |
80
+ | 38 | pack - week - day - ordered - land | 1332 | Package Delay and Shipping |
81
+ | 39 | pm - interested - looking - find - please | 1499 | PM Interested Help Explanation |
82
+ | 40 | hugbunter - hugbunter hugbunter - link hugbunter - hugbunter link - link hugbunter hugbunter | 1157 | hugbunter links |
83
+ | 41 | drug - police - court - enforcement - investigation | 923 | Darknet Drug Enforcement |
84
+ | 42 | stealth - good - good stealth - vendor - shipping | 849 | Good Stealth Vendor Shipping |
85
+ | 43 | counterfeit - note - euro - bill - pen | 968 | Counterfeit Money Sales |
86
+ | 44 | empire - nightmare - empire empire - find empire - empire nightmare | 1493 | Empire Name Search |
87
+ | 45 | day - waiting - week - month - hour | 1040 | Waiting Time |
88
+ | 46 | id - passport - fake - license - scan | 1681 | Fake IDs & Documents |
89
+ | 47 | bank - account - drop - bank drop - cash | 1623 | Bank Drop Transaction |
90
+ | 48 | wickr - use wickr - using wickr - contact - via wickr | 3405 | Wickr Abuse Policy Protect Wickr Community |
91
+ | 49 | de - und - que - un - da | 751 | German Darknet Market |
92
+ | 50 | phishing - phishing link - link - phished - phishing site | 1081 | Phishing Detection Techniques |
93
+ | 51 | dream - nightmare - dream dream - anyone - like | 1294 | Dream Nightmare Experience |
94
+ | 52 | price - sale - promo - sell - buy | 756 | Sale;Promotional Offers;Good Deals |
95
+ | 53 | tails - tail - usb - electrum - persistent | 1101 | Tails;Electrum;Persistent File;USB Installation |
96
+ | 54 | adderall - amphetamine - mg - replacement - speed | 1583 | Adderall Replacement Pills |
97
+ | 55 | cancel - order - auto - cancel order - day | 1353 | Auto Cancel Orders |
98
+ | 56 | mushroom - shrooms - cubensis - psilocybin - spore | 1294 | Mushroom Guide & Dosage Information |
99
+ | 57 | ketamine - gm - gm gm - gm gm gm - vendor | 1134 | Ketamine Vendor Quality Shard |
100
+ | 58 | exit - exit scam - scam - exit scamming - exit scammed | 1163 | Exit Scam Market |
101
+ | 59 | phone - burner - sim - card - number | 691 | Burner Phone Usage |
102
+ | 60 | dream - market - dream market - nightmare - nightmare market | 830 | Dream Market |
103
+ | 61 | bond - vendor bond - vendor - bond back - market | 2239 | Vendor Bond Waiver Market |
104
+ | 62 | vpn - tor - use - using - proxy | 546 | VPN and Tor Use |
105
+ | 63 | jabber - telegram - xmpp - pidgin - otr | 1054 | Jabber/XMPP/OTR Chat Clients |
106
+ | 64 | dmt - psychedelics - per - psychedelic - changa | 588 | DMT Psychedelics Prices |
107
+ | 65 | captcha - captchas - page - enter - login | 899 | Captcha Issues in Darknet Market |
108
+ | 66 | sample - free sample - free - review - sample pack | 713 | Free Sample Order |
109
+ | 67 | update - issue - problem - working - fixed | 682 | Fixed Bug Issue |
110
+ | 68 | cgmc - invite - vendor - cgmc cgmc - link cgmc | 1410 | CGMC Invites |
111
+ | 69 | apollon - apollon market - market - empire - apollomarket | 694 | Apollon Market Update |
112
+ | 70 | paypal - transfer - account - paypal account - paypal transfer | 750 | PayPal Account Transfer |
113
+ | 71 | giveaway - win - number - winner - contest | 1008 | Giveaways & Contests |
114
+ | 72 | pm - working link - link - link please - please | 739 | Working Link Requests |
115
+ | 73 | darkfail - link - fail - dark - dark fail | 745 | Dark Fail Links Question |
116
+ | 74 | empire - market - empire market - nightmare - alphabay | 598 | Empire Market Vendor Feedback |
117
+ | 75 | package - pack - delivery - tracking - day | 2959 | Package Delivery Tracking |
118
+ | 76 | bag - dog - seal - mylar - vac | 3821 | Smuggling Methods & Detection |
119
+ | 77 | opsec - opsec opsec - link opsec - opsec link - opsec opsec link | 542 | Opsec Guidance |
120
+ | 78 | link - working - working link - main link - link working | 618 | Link issues |
121
+ | 79 | money - pay - money back - dollar - get | 875 | Money Losses |
122
+ | 80 | tracking - tracking number - number - order - day | 513 | Tracking Number Concerns |
123
+ | 81 | guide - tutorial - outdated - thanks - method | 1078 | Guide Topic or Tutorial Help or Out |
124
+ | 82 | bir - bu - kai - ama - var | 357 | bu bir kai |
125
+ | 83 | rc - mxe - dck - rcs - fdck | 356 | RC Sources |
126
+ | 84 | cash - btc - bitcoin - coinbase - atm | 549 | Crypto Purchase Methods |
127
+ | 85 | olympus - market - fe escrow - olympus market - dream | 1271 | Olympus Market |
128
+ | 86 | log - logged - logging - login - page | 338 | Logging and Session Issues |
129
+ | 87 | vacation - vacation mode - mode - back - profile | 401 | Vacation Mode |
130
+ | 88 | message - contact - email - support - send | 331 | Message;Email Support |
131
+ | 89 | post - mod - comment - delete - thread | 748 | Moderation and Deletion of Posts |
132
+ | 90 | xmr - wallet - deposit - monero - payment id | 4471 | XMR Wallet Issue |
133
+ | 91 | image - exif - upload - exif data - data | 946 | Image Exif Data Upload |
134
+ | 92 | back - hope - welcome back - luck - good | 399 | hope recovery |
135
+ | 93 | review - template - pic - picture - table | 2240 | Review templates and images |
136
+ | 94 | cheer - cheer cheer - cheer mate - mate - anyone | 547 | Cheer Positivity Justification |
137
+ | 95 | bulk - price - kratom - good - kg | 307 | Bulk Kratom Vendors |
138
+ | 96 | wallstreet - wall st - wall - st - wallstreetmarket | 675 | Wallstreet Market Forum Links |
139
+ | 97 | product - stealth - shipping - quality - price | 685 | Product Review |
140
+ | 98 | listing - list - superlist - vendor - search | 2037 | Listing Management and Visibility |
141
+ | 99 | fuck - cunt - dick - fud fud - fud fud fud | 493 | Mom sex;Insults |
142
+ | 100 | empire - exit - market - scam - exit scam | 3076 | Empire Market |
143
+ | 101 | protonmail - protonmailcom - email - proton - secmail | 1138 | Protonmail Alternatives |
144
+ | 102 | wallet - node - gui - monero - remote node | 296 | Monero Wallet Update |
145
+ | 103 | multisig - market - transaction - escrow - use multisig | 423 | MultiSig Market Transactions |
146
+ | 104 | bunk - bunk bar - bar - hulk - sent bunk | 307 | Bunk and Bar |
147
+ | 105 | mg - benzo - benzos - alprazolam - alp | 355 | Benzodiazepine use and abuse |
148
+ | 106 | pelican - bird - pelicanvendor - bigbird - pelicanvendor pelicanvendor link | 2750 | Pelican Bird Giveaway |
149
+ | 107 | heinekenexpress - link heinekenexpress - heinekenexpress link - heinekenexpress heinekenexpress - link heinekenexpress heinekenexpress | 249 | Heineken Express Reviews |
150
+ | 108 | rdp - sock - vpn - ip - card | 266 | RDP Socks for Carding |
151
+ | 109 | dnm - dm - dread - forum - link | 401 | DNM Reddit Subs |
152
+ | 110 | pic - picture - photo - photoshop - post pic | 937 | Pics and posts |
153
+ | 111 | empire - link - empiremarket - empire link - link empire | 476 | Empire Market Links |
154
+ | 112 | invite - invite code - code - need invite - get invite | 398 | Darknet Market Invites |
155
+ | 113 | samsara - market - samsara market - sam - dream | 224 | Samsara Market |
156
+ | 114 | chemical - test - lab - powder - product | 290 | Chemistry Research and Supply |
157
+ | 115 | rapture - rapture market - rapturemarket - market - gbp | 1262 | Rapture Market GBP |
158
+ | 116 | water - acetone - powder - dry - ml | 214 | Acetone Recrystallization Techniques |
159
+ | 117 | witchman - link witchman - link - witchman link - witchman witchman | 1754 | Link Witchman Discussion |
160
+ | 118 | tochka - market - tochka market - tochka tochka - use tochka | 218 | Tochka market |
161
+ | 119 | post - know guy know - guy know guy - know guy - guy know | 210 | Read Post Discussion |
162
+ | 120 | subdread - sub - post - subdreads - create | 2589 | Subdread creation issues |
163
+ | 121 | crosspost - giveaway - review crosspost - crosspost vendor - review | 509 | Crosspost Giveaway Review |
164
+ | 122 | oxycodone - mg - oxy - opiate - opiateconnect | 999 | Opiate Dosages |
165
+ | 123 | drug - drugsuk - drugs - selling drug - drug dealer | 608 | Drugs and Drug Market |
166
+ | 124 | hacked - hacker - hacking - job - lfw | 1201 | Hacker Job |
167
+ | 125 | login - account - password - log - fa | 471 | Login and Registration Issues |
168
+ | 126 | cc - cvv - vbv - cc vendor - cc cvv | 470 | Credit Card Data |
169
+ | 127 | withdraw - withdrawal - withdrawl - withdraws - btc | 382 | Bitcoin Withdrawal |
170
+ | 128 | heard - happened - anyone - anyone heard - thewizzardnl | 397 | Something Happened |
171
+ | 129 | benzos - benzo - rc - benzo vendor - rc benzos | 281 | Benzos Vendors |
172
+ | 130 | fraud - fraudsters - fraud vendor - loan fraud - fraudfox | 308 | Fraud and Loan Scams |
173
+ | 131 | mephedrone - meopcp - mxe - mescaline - mmc | 326 | Chemicals and Drugs |
174
+ | 132 | socialism - lesson - applied socialism - practical - practical lesson applied | 178 | Applied Socialism |
175
+ | 133 | trump - democrats - pelosi - biden - election | 10136 | 2020 Election Fraud Impeachment |
176
+ | 134 | border - illegal - wall - trump - mexico | 2606 | Border Wall Debate |
177
+ | 135 | israel - iran - syria - us - israeli | 1802 | Middle East Tensions Wars |
178
+ | 136 | climate - climate change - change - warming - global warming | 1740 | Climate Change Funding |
179
+ | 137 | sgt - sgt report - report - appeared first - appeared first sgt | 915 | SGT Report Articles |
180
+ | 138 | mueller - fbi - trump - clinton - obama | 832 | Trump Deep State |
181
+ | 139 | facebook - google - tech - twitter - social media | 3596 | Big Tech Censorship |
182
+ | 140 | gold - silver - report - the post - sgt report | 818 | Gold Silver Ratio |
183
+ | 141 | epstein - jeffrey epstein - jeffrey - sex - maxwell | 750 | Epstein Maxwell Sex Scandal |
184
+ | 142 | women - men - transgender - gender - feminism | 569 | Transgender Rights and Feminism |
185
+ | 143 | jews - jewish - jew - holocaust - the jews | 485 | 20th Century Jewish History |
186
+ | 144 | kavanaugh - ford - christine - brett - brett kavanaugh | 590 | Kavanaugh Accuser |
187
+ | 145 | white - racist - white people - race - black | 442 | White Racism Follow |
188
+ | 146 | youtube - music - favorite - what favorite - what favorite music | 571 | Favorite Music Youtube |
189
+ | 147 | vaccine - vaccines - measles - vaccination - flu | 398 | Vaccine Lawsuit Losses |
190
+ | 148 | abortion - planned parenthood - parenthood - planned - babies | 400 | Planned Parenthood Abortion |
191
+ | 149 | christians - christianity - pope - christian - church | 281 | Christianity & Religion |
192
+ | 150 | media - news - cnn - fake news - fake | 551 | Mainstream Media and Fake News |
193
+ | 151 | antifa - portland - police - violence - protesters | 662 | Antifa Portland Attacks Journalist |
194
+ | 152 | college - school - students - schools - education | 337 | Education Politics |
195
+ | 153 | stormfront - stormfront sucks - re stormfront sucks - re stormfront - sucks | 374 | Stormfront Criticism |
196
+ | 154 | assange - julian - julian assange - wikileaks - us | 197 | Julian Assange Expulsion |
197
+ | 155 | coronavirus - virus - pandemic - outbreak - wuhan | 192 | Coronavirus Pandemic |
198
+
199
+ </details>
200
+
201
+ ## Training hyperparameters
202
+
203
+ * calculate_probabilities: True
204
+ * language: None
205
+ * low_memory: True
206
+ * min_topic_size: 10
207
+ * n_gram_range: (1, 3)
208
+ * nr_topics: None
209
+ * seed_topic_list: None
210
+ * top_n_words: 10
211
+ * verbose: True
212
+ * zeroshot_min_similarity: 0.7
213
+ * zeroshot_topic_list: None
214
+
215
+ ## Framework versions
216
+
217
+ * Numpy: 1.26.4
218
+ * HDBSCAN: 0.8.36
219
+ * UMAP: 0.5.6
220
+ * Pandas: 2.2.1
221
+ * Scikit-Learn: 1.4.1.post1
222
+ * Sentence-transformers: 3.0.1
223
+ * Transformers: 4.39.3
224
+ * Numba: 0.60.0
225
+ * Plotly: 5.22.0
226
+ * Python: 3.12.2
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": true,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 3
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": true,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null,
16
+ "embedding_model": "all-MiniLM-L6-v2"
17
+ }
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de9d5a54e9c2b1e8e45c50a0af8932e95fa0c873cab8a069c33a15658d58812d
3
+ size 241240
topics.json ADDED
The diff for this file is too large to render. See raw diff