Add BERTopic model
Browse files- README.md +138 -0
- config.json +17 -0
- ctfidf.safetensors +3 -0
- ctfidf_config.json +0 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
README.md
ADDED
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# ISSR_Dark_Web_68Topics
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("D0men1c0/ISSR_Dark_Web_68Topics")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 69
|
34 |
+
* Number of training documents: 65529
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| -1 | anyone - get - update - review - new | 178 | outliers |
|
42 |
+
| 0 | weed - cannabis - cart - thc - review | 18900 | Cannabis Weed Vape Cart Reviews |
|
43 |
+
| 1 | help - guy - sub - need - back | 5021 | Subreddit Help Needed |
|
44 |
+
| 2 | order - shipping - package - pack - delivery | 2035 | USPS Package Delivery |
|
45 |
+
| 3 | empire - empire market - empire empire - market - deposit | 2005 | Empire Market Deposit Issues |
|
46 |
+
| 4 | vendor - vendor vendor - vendor inquiry - inquiry - new vendor | 1815 | Trusted Vendor Inquiries |
|
47 |
+
| 5 | scammer - scam - exit - scamming - scammed | 3142 | Exit Scammer Warning |
|
48 |
+
| 6 | darknet - dark - web - dark web - darkfail | 1649 | Dark Web Drug Trafficking Arrests |
|
49 |
+
| 7 | mdma - mda - mdma vendor - domestic - usa | 1390 | MDMA Vendor USA Sale |
|
50 |
+
| 8 | xanax - mg - diazepam - xanax vendor - valium | 1319 | Xanax Vendor Xanax Bars |
|
51 |
+
| 9 | lsd - ug - tab - lsd vendor - acid | 1233 | LSD Vendor Tab List |
|
52 |
+
| 10 | crosspost - giveaway - review crosspost - crosspost vendor - review | 1012 | Giveaway Crossposts |
|
53 |
+
| 11 | monero - btc - bitcoin - coin - wallet | 999 | Monero Exchange |
|
54 |
+
| 12 | carding - card - credit - credit card - debit | 933 | Credit Card Service |
|
55 |
+
| 13 | dream - dream market - nightmare - market - dream dream | 909 | Dream Market |
|
56 |
+
| 14 | dispute - mod - moderator - dispute dispute - please | 882 | Dispute resolution |
|
57 |
+
| 15 | cocaine - cocaine vendor - fishscale - peruvian - colombian | 763 | Cocaine Vendor Fish |
|
58 |
+
| 16 | review - vendor review - review vendor - vendor - review review | 771 | Vendor Reviews Product |
|
59 |
+
| 17 | market - market market - new market - markets - marketplace | 998 | New Market Core |
|
60 |
+
| 18 | pgp - key - pgp key - public - public pgp | 1020 | Public PGP Key |
|
61 |
+
| 19 | deposit - deposited - ticket - address - double | 654 | Double Deposit Support Ticket |
|
62 |
+
| 20 | bar - bunk - bars - selaminy - hulk | 646 | Bar Reviews |
|
63 |
+
| 21 | oxycodone - mg - oxy - opiate - opiateconnect | 590 | Oxycodone and Dilaudid Purchase |
|
64 |
+
| 22 | id - passport - fake - fake id - license | 608 | Fake IDs and Licenses |
|
65 |
+
| 23 | drug - drugsuk - drugs - selling drug - drug dealer | 544 | Drug Misadvertizing Risks |
|
66 |
+
| 24 | coke - coke vendor - best coke - uk coke - uk | 556 | Coke Topic |
|
67 |
+
| 25 | pill - xtc - xtc pill - ecstasy - pills | 544 | xtc pills for sale |
|
68 |
+
| 26 | counterfeit - note - euro - money - counterfeit money | 509 | Counterfeit Notes |
|
69 |
+
| 27 | ketamine - ketamine vendor - ketamine review - mdma ketamine - review ketamine | 483 | Ketamine Vendor |
|
70 |
+
| 28 | wsm - wsm wsm - wsm vendor - vendor wsm - wsm order | 467 | WSM Vendor |
|
71 |
+
| 29 | meth - crystal meth - crystal - meth vendor - methamphetamine | 485 | Crystal Meth Vendor Review |
|
72 |
+
| 30 | ticket - support ticket - support - please - ticket support | 479 | Ticket Support |
|
73 |
+
| 31 | hacked - hacker - hacking - job - lfw | 480 | Hacker Developer Job Exploits |
|
74 |
+
| 32 | login - account - password - log - fa | 471 | Login Issues |
|
75 |
+
| 33 | adderall - mg - ir - ritalin - vyvanse | 470 | Adderall Pharmacy Brand Name |
|
76 |
+
| 34 | xmr - btc xmr - btc - xmrto - xmr btc | 474 | xmr xmr |
|
77 |
+
| 35 | tails - tail - electrum - wallet - whonix | 433 | Tails Electrum Monero Wallet Issue |
|
78 |
+
| 36 | mushroom - mushrooms - shrooms - magic - cubensis | 393 | Mushrooms Magic Penis |
|
79 |
+
| 37 | dread - dread dread - cafe dread - cafe - dread word | 371 | Cafe Dread Topics |
|
80 |
+
| 38 | cc - cvv - vbv - cc vendor - cc cvv | 419 | cc vending |
|
81 |
+
| 39 | cryptonia - cryptonia market - cryptonia cryptonia - dcdutchconnectionuk - empire cryptonia | 382 | Cryptonia Vendor |
|
82 |
+
| 40 | withdraw - withdrawal - withdrawl - withdraws - btc | 375 | Withdrawal Issues BTC |
|
83 |
+
| 41 | escrow - multisig - escrow escrow - full escrow - escrow order | 397 | Escrow Services and Multisig |
|
84 |
+
| 42 | heroin - heroin vendor - afghan - afghan heroin - synthetic heroin | 335 | Afghan Heroin Sale |
|
85 |
+
| 43 | de - har - noen - som - fra | 378 | Discussion Topics |
|
86 |
+
| 44 | dnm - dnms - dn - bible - dnstars | 341 | DNMS Bible |
|
87 |
+
| 45 | wallstreet - wall - wall street - street - wall st | 339 | Wall Street Market |
|
88 |
+
| 46 | ddos - ddos attack - attack - ddos ddos - ddos attacks | 306 | DDOS Attacks |
|
89 |
+
| 47 | paypal - transfer - paypal transfer - paypal account - western union | 283 | PayPal Transfer Scams |
|
90 |
+
| 48 | heard - happened - anyone - anyone heard - thewizzardnl | 327 | Document Mentions |
|
91 |
+
| 49 | benzos - benzo - rc - benzo vendor - rc benzos | 281 | Benzos Vendors |
|
92 |
+
| 50 | fraud - fraudsters - fraud vendor - loan fraud - fraudfox | 308 | Fraud Vendor Loan |
|
93 |
+
| 51 | dream - dream vendor - dream market - vendor dream - vendor | 326 | Dream Market Vendor Inquiry |
|
94 |
+
| 52 | order - cancel - cancelled - refund - cancel order | 497 | Order Cancelled |
|
95 |
+
| 53 | bank - bank log - bank drop - log - bank account | 331 | Bank Fraud Cards |
|
96 |
+
| 54 | onion - onion site - site - onion link - onion list | 328 | Onion links |
|
97 |
+
| 55 | phishing - phishing link - phished - link - warning | 245 | Phishing Warning |
|
98 |
+
| 56 | apollon - apollon market - market - apollon apollon - mysteryland | 253 | Apollon Market |
|
99 |
+
| 57 | opsec - opsec opsec - opsec question - bad opsec - question | 242 | Opsec and Guides |
|
100 |
+
| 58 | link - working link - working - pm - link please | 226 | PM Working Share Links |
|
101 |
+
| 59 | mirror - working mirror - working - mirror link - empire mirror | 229 | mirror link |
|
102 |
+
| 60 | fentanyl - fent - carfentanil - selling fentanyl - analogue | 219 | Fentanyl |
|
103 |
+
| 61 | cgmc - invite - invite code - code - cgmc invite | 221 | Invite Code CGMC |
|
104 |
+
| 62 | alprazolam - powder - alprazolam powder - flualprazolam - etizolam | 211 | Alprazolam Powder |
|
105 |
+
| 63 | dmt - dmt vendor - dmt vape - odsmt - dmt dmt | 290 | DMT Vendors |
|
106 |
+
| 64 | captcha - rapture - rapture market - captcha captcha - incorrect | 202 | Rapture Market Captcha |
|
107 |
+
| 65 | chemical - research - research chemical - chems - research chemicals | 187 | Research Chemicals |
|
108 |
+
| 66 | tor - tor browser - browser - tor network - network | 198 | Tor Browser Research |
|
109 |
+
| 67 | mephedrone - meopcp - mxe - mescaline - mmc | 222 | Mephedrone |
|
110 |
+
|
111 |
+
</details>
|
112 |
+
|
113 |
+
## Training hyperparameters
|
114 |
+
|
115 |
+
* calculate_probabilities: True
|
116 |
+
* language: None
|
117 |
+
* low_memory: False
|
118 |
+
* min_topic_size: 10
|
119 |
+
* n_gram_range: (1, 2)
|
120 |
+
* nr_topics: None
|
121 |
+
* seed_topic_list: None
|
122 |
+
* top_n_words: 10
|
123 |
+
* verbose: True
|
124 |
+
* zeroshot_min_similarity: 0.7
|
125 |
+
* zeroshot_topic_list: None
|
126 |
+
|
127 |
+
## Framework versions
|
128 |
+
|
129 |
+
* Numpy: 1.26.4
|
130 |
+
* HDBSCAN: 0.8.36
|
131 |
+
* UMAP: 0.5.6
|
132 |
+
* Pandas: 2.2.1
|
133 |
+
* Scikit-Learn: 1.4.1.post1
|
134 |
+
* Sentence-transformers: 3.0.1
|
135 |
+
* Transformers: 4.39.3
|
136 |
+
* Numba: 0.60.0
|
137 |
+
* Plotly: 5.22.0
|
138 |
+
* Python: 3.12.2
|
config.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": true,
|
3 |
+
"language": null,
|
4 |
+
"low_memory": false,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
2
|
9 |
+
],
|
10 |
+
"nr_topics": null,
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 10,
|
13 |
+
"verbose": true,
|
14 |
+
"zeroshot_min_similarity": 0.7,
|
15 |
+
"zeroshot_topic_list": null,
|
16 |
+
"embedding_model": "all-MiniLM-L6-v2"
|
17 |
+
}
|
ctfidf.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f1ad3ce7fe13070f89799cec9cc3b3127d5638ed1017b995f2c3201bc6e93943
|
3 |
+
size 5661292
|
ctfidf_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8fe5935815052732831c8f48547a9a73bdab2727e3ee2e6159734be5d176e196
|
3 |
+
size 106072
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|