D0men1c0 commited on
Commit
26b7a91
1 Parent(s): 97ceb7e

Add BERTopic model

Browse files
README.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - bertopic
5
+ library_name: bertopic
6
+ pipeline_tag: text-classification
7
+ ---
8
+
9
+ # ISSR_Dark_Web_68Topics
10
+
11
+ This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
12
+ BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
13
+
14
+ ## Usage
15
+
16
+ To use this model, please install BERTopic:
17
+
18
+ ```
19
+ pip install -U bertopic
20
+ ```
21
+
22
+ You can use the model as follows:
23
+
24
+ ```python
25
+ from bertopic import BERTopic
26
+ topic_model = BERTopic.load("D0men1c0/ISSR_Dark_Web_68Topics")
27
+
28
+ topic_model.get_topic_info()
29
+ ```
30
+
31
+ ## Topic overview
32
+
33
+ * Number of topics: 69
34
+ * Number of training documents: 65529
35
+
36
+ <details>
37
+ <summary>Click here for an overview of all topics.</summary>
38
+
39
+ | Topic ID | Topic Keywords | Topic Frequency | Label |
40
+ |----------|----------------|-----------------|-------|
41
+ | -1 | anyone - get - update - review - new | 178 | outliers |
42
+ | 0 | weed - cannabis - cart - thc - review | 18900 | Cannabis Weed Vape Cart Reviews |
43
+ | 1 | help - guy - sub - need - back | 5021 | Subreddit Help Needed |
44
+ | 2 | order - shipping - package - pack - delivery | 2035 | USPS Package Delivery |
45
+ | 3 | empire - empire market - empire empire - market - deposit | 2005 | Empire Market Deposit Issues |
46
+ | 4 | vendor - vendor vendor - vendor inquiry - inquiry - new vendor | 1815 | Trusted Vendor Inquiries |
47
+ | 5 | scammer - scam - exit - scamming - scammed | 3142 | Exit Scammer Warning |
48
+ | 6 | darknet - dark - web - dark web - darkfail | 1649 | Dark Web Drug Trafficking Arrests |
49
+ | 7 | mdma - mda - mdma vendor - domestic - usa | 1390 | MDMA Vendor USA Sale |
50
+ | 8 | xanax - mg - diazepam - xanax vendor - valium | 1319 | Xanax Vendor Xanax Bars |
51
+ | 9 | lsd - ug - tab - lsd vendor - acid | 1233 | LSD Vendor Tab List |
52
+ | 10 | crosspost - giveaway - review crosspost - crosspost vendor - review | 1012 | Giveaway Crossposts |
53
+ | 11 | monero - btc - bitcoin - coin - wallet | 999 | Monero Exchange |
54
+ | 12 | carding - card - credit - credit card - debit | 933 | Credit Card Service |
55
+ | 13 | dream - dream market - nightmare - market - dream dream | 909 | Dream Market |
56
+ | 14 | dispute - mod - moderator - dispute dispute - please | 882 | Dispute resolution |
57
+ | 15 | cocaine - cocaine vendor - fishscale - peruvian - colombian | 763 | Cocaine Vendor Fish |
58
+ | 16 | review - vendor review - review vendor - vendor - review review | 771 | Vendor Reviews Product |
59
+ | 17 | market - market market - new market - markets - marketplace | 998 | New Market Core |
60
+ | 18 | pgp - key - pgp key - public - public pgp | 1020 | Public PGP Key |
61
+ | 19 | deposit - deposited - ticket - address - double | 654 | Double Deposit Support Ticket |
62
+ | 20 | bar - bunk - bars - selaminy - hulk | 646 | Bar Reviews |
63
+ | 21 | oxycodone - mg - oxy - opiate - opiateconnect | 590 | Oxycodone and Dilaudid Purchase |
64
+ | 22 | id - passport - fake - fake id - license | 608 | Fake IDs and Licenses |
65
+ | 23 | drug - drugsuk - drugs - selling drug - drug dealer | 544 | Drug Misadvertizing Risks |
66
+ | 24 | coke - coke vendor - best coke - uk coke - uk | 556 | Coke Topic |
67
+ | 25 | pill - xtc - xtc pill - ecstasy - pills | 544 | xtc pills for sale |
68
+ | 26 | counterfeit - note - euro - money - counterfeit money | 509 | Counterfeit Notes |
69
+ | 27 | ketamine - ketamine vendor - ketamine review - mdma ketamine - review ketamine | 483 | Ketamine Vendor |
70
+ | 28 | wsm - wsm wsm - wsm vendor - vendor wsm - wsm order | 467 | WSM Vendor |
71
+ | 29 | meth - crystal meth - crystal - meth vendor - methamphetamine | 485 | Crystal Meth Vendor Review |
72
+ | 30 | ticket - support ticket - support - please - ticket support | 479 | Ticket Support |
73
+ | 31 | hacked - hacker - hacking - job - lfw | 480 | Hacker Developer Job Exploits |
74
+ | 32 | login - account - password - log - fa | 471 | Login Issues |
75
+ | 33 | adderall - mg - ir - ritalin - vyvanse | 470 | Adderall Pharmacy Brand Name |
76
+ | 34 | xmr - btc xmr - btc - xmrto - xmr btc | 474 | xmr xmr |
77
+ | 35 | tails - tail - electrum - wallet - whonix | 433 | Tails Electrum Monero Wallet Issue |
78
+ | 36 | mushroom - mushrooms - shrooms - magic - cubensis | 393 | Mushrooms Magic Penis |
79
+ | 37 | dread - dread dread - cafe dread - cafe - dread word | 371 | Cafe Dread Topics |
80
+ | 38 | cc - cvv - vbv - cc vendor - cc cvv | 419 | cc vending |
81
+ | 39 | cryptonia - cryptonia market - cryptonia cryptonia - dcdutchconnectionuk - empire cryptonia | 382 | Cryptonia Vendor |
82
+ | 40 | withdraw - withdrawal - withdrawl - withdraws - btc | 375 | Withdrawal Issues BTC |
83
+ | 41 | escrow - multisig - escrow escrow - full escrow - escrow order | 397 | Escrow Services and Multisig |
84
+ | 42 | heroin - heroin vendor - afghan - afghan heroin - synthetic heroin | 335 | Afghan Heroin Sale |
85
+ | 43 | de - har - noen - som - fra | 378 | Discussion Topics |
86
+ | 44 | dnm - dnms - dn - bible - dnstars | 341 | DNMS Bible |
87
+ | 45 | wallstreet - wall - wall street - street - wall st | 339 | Wall Street Market |
88
+ | 46 | ddos - ddos attack - attack - ddos ddos - ddos attacks | 306 | DDOS Attacks |
89
+ | 47 | paypal - transfer - paypal transfer - paypal account - western union | 283 | PayPal Transfer Scams |
90
+ | 48 | heard - happened - anyone - anyone heard - thewizzardnl | 327 | Document Mentions |
91
+ | 49 | benzos - benzo - rc - benzo vendor - rc benzos | 281 | Benzos Vendors |
92
+ | 50 | fraud - fraudsters - fraud vendor - loan fraud - fraudfox | 308 | Fraud Vendor Loan |
93
+ | 51 | dream - dream vendor - dream market - vendor dream - vendor | 326 | Dream Market Vendor Inquiry |
94
+ | 52 | order - cancel - cancelled - refund - cancel order | 497 | Order Cancelled |
95
+ | 53 | bank - bank log - bank drop - log - bank account | 331 | Bank Fraud Cards |
96
+ | 54 | onion - onion site - site - onion link - onion list | 328 | Onion links |
97
+ | 55 | phishing - phishing link - phished - link - warning | 245 | Phishing Warning |
98
+ | 56 | apollon - apollon market - market - apollon apollon - mysteryland | 253 | Apollon Market |
99
+ | 57 | opsec - opsec opsec - opsec question - bad opsec - question | 242 | Opsec and Guides |
100
+ | 58 | link - working link - working - pm - link please | 226 | PM Working Share Links |
101
+ | 59 | mirror - working mirror - working - mirror link - empire mirror | 229 | mirror link |
102
+ | 60 | fentanyl - fent - carfentanil - selling fentanyl - analogue | 219 | Fentanyl |
103
+ | 61 | cgmc - invite - invite code - code - cgmc invite | 221 | Invite Code CGMC |
104
+ | 62 | alprazolam - powder - alprazolam powder - flualprazolam - etizolam | 211 | Alprazolam Powder |
105
+ | 63 | dmt - dmt vendor - dmt vape - odsmt - dmt dmt | 290 | DMT Vendors |
106
+ | 64 | captcha - rapture - rapture market - captcha captcha - incorrect | 202 | Rapture Market Captcha |
107
+ | 65 | chemical - research - research chemical - chems - research chemicals | 187 | Research Chemicals |
108
+ | 66 | tor - tor browser - browser - tor network - network | 198 | Tor Browser Research |
109
+ | 67 | mephedrone - meopcp - mxe - mescaline - mmc | 222 | Mephedrone |
110
+
111
+ </details>
112
+
113
+ ## Training hyperparameters
114
+
115
+ * calculate_probabilities: True
116
+ * language: None
117
+ * low_memory: False
118
+ * min_topic_size: 10
119
+ * n_gram_range: (1, 2)
120
+ * nr_topics: None
121
+ * seed_topic_list: None
122
+ * top_n_words: 10
123
+ * verbose: True
124
+ * zeroshot_min_similarity: 0.7
125
+ * zeroshot_topic_list: None
126
+
127
+ ## Framework versions
128
+
129
+ * Numpy: 1.26.4
130
+ * HDBSCAN: 0.8.36
131
+ * UMAP: 0.5.6
132
+ * Pandas: 2.2.1
133
+ * Scikit-Learn: 1.4.1.post1
134
+ * Sentence-transformers: 3.0.1
135
+ * Transformers: 4.39.3
136
+ * Numba: 0.60.0
137
+ * Plotly: 5.22.0
138
+ * Python: 3.12.2
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "calculate_probabilities": true,
3
+ "language": null,
4
+ "low_memory": false,
5
+ "min_topic_size": 10,
6
+ "n_gram_range": [
7
+ 1,
8
+ 2
9
+ ],
10
+ "nr_topics": null,
11
+ "seed_topic_list": null,
12
+ "top_n_words": 10,
13
+ "verbose": true,
14
+ "zeroshot_min_similarity": 0.7,
15
+ "zeroshot_topic_list": null,
16
+ "embedding_model": "all-MiniLM-L6-v2"
17
+ }
ctfidf.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1ad3ce7fe13070f89799cec9cc3b3127d5638ed1017b995f2c3201bc6e93943
3
+ size 5661292
ctfidf_config.json ADDED
The diff for this file is too large to render. See raw diff
 
topic_embeddings.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8fe5935815052732831c8f48547a9a73bdab2727e3ee2e6159734be5d176e196
3
+ size 106072
topics.json ADDED
The diff for this file is too large to render. See raw diff