Add BERTopic model
Browse files- .gitattributes +1 -0
- README.md +191 -0
- config.json +17 -0
- ctfidf.safetensors +3 -0
- ctfidf_config.json +3 -0
- topic_embeddings.safetensors +3 -0
- topics.json +0 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
ctfidf_config.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,191 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
tags:
|
4 |
+
- bertopic
|
5 |
+
library_name: bertopic
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
# ISSR_Dark_Web_121Topics
|
10 |
+
|
11 |
+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
|
12 |
+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
To use this model, please install BERTopic:
|
17 |
+
|
18 |
+
```
|
19 |
+
pip install -U bertopic
|
20 |
+
```
|
21 |
+
|
22 |
+
You can use the model as follows:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from bertopic import BERTopic
|
26 |
+
topic_model = BERTopic.load("D0men1c0/ISSR_Dark_Web_121Topics")
|
27 |
+
|
28 |
+
topic_model.get_topic_info()
|
29 |
+
```
|
30 |
+
|
31 |
+
## Topic overview
|
32 |
+
|
33 |
+
* Number of topics: 122
|
34 |
+
* Number of training documents: 260996
|
35 |
+
|
36 |
+
<details>
|
37 |
+
<summary>Click here for an overview of all topics.</summary>
|
38 |
+
|
39 |
+
| Topic ID | Topic Keywords | Topic Frequency | Label |
|
40 |
+
|----------|----------------|-----------------|-------|
|
41 |
+
| -1 | vendor - order - nt - market - link | 509 | Outliers |
|
42 |
+
| 0 | cart - weed - strain - thc - bud | 91708 | Product Reviews and Purchases |
|
43 |
+
| 1 | deposit - address - ticket - btc - wallet | 14083 | Empire Deposit & Withdrawal Issues |
|
44 |
+
| 2 | key - pgp - account - pgp key - password | 5365 | PGP Key Security |
|
45 |
+
| 3 | order - shipped - ordered - day - week | 5696 | Order Shipping Status |
|
46 |
+
| 4 | scam - scammer - scam scam - scam scam scam - scammed | 3368 | Vendor Scams and Detection |
|
47 |
+
| 5 | thanks - thank - lol - man - bro | 3187 | Friendly Positive Talk |
|
48 |
+
| 6 | ship - country - eu - shipping - uk | 4584 | Shipping in EU Countries |
|
49 |
+
| 7 | coke - cocaine - quality - product - good | 3425 | High Quality Cocaine |
|
50 |
+
| 8 | card - carding - cc - gift card - gift | 2846 | Carding Strategies |
|
51 |
+
| 9 | pgp - begin pgp - begin - pgp signature - signature | 3283 | PGP Signature End |
|
52 |
+
| 10 | lsd - tab - ug - acid - blotter | 2462 | LSD Tab Marketplace Reviews |
|
53 |
+
| 11 | vendor - good - anyone - know - legit | 2562 | Vendor Recommendation |
|
54 |
+
| 12 | dispute - refund - mod - order - moderator | 2692 | Dispute Resolution |
|
55 |
+
| 13 | wsm - dream - market - exit - vendor | 2100 | WSM Exit Scam Warnings |
|
56 |
+
| 14 | drug - police - get - nt - house | 1970 | Drugs and Police Enforcement |
|
57 |
+
| 15 | monero - xmr - wallet - btc - exchange | 2214 | Monero Wallet Exchange and Bitcoin Use |
|
58 |
+
| 16 | ddos - attack - ddos attack - mirror - market | 2995 | Dread Market DDoS Attack |
|
59 |
+
| 17 | mdma - mda - price - quality - vendor | 1773 | MDMA Vendor Quality Prices |
|
60 |
+
| 18 | darknet - clearnet - dark - link - darknetmarkets | 1925 | Darknet Market |
|
61 |
+
| 19 | sub - post - mod - banned - link | 1887 | Dread Market Forum Rules and Bans |
|
62 |
+
| 20 | bar - alp - hulk - press - pack | 1942 | Alprazolam Pressed Bars Reviews |
|
63 |
+
| 21 | xanax - bar - alp - mg - alprazolam | 2350 | Xanax Bars and Vendors |
|
64 |
+
| 22 | market - new market - market market - good - new | 2036 | Market Upgrade Support |
|
65 |
+
| 23 | feedback - review - vendor - negative - positive | 1650 | ecommerce feedback |
|
66 |
+
| 24 | mirror - working - working mirror - mirror link - link | 1802 | Mirror Link Working |
|
67 |
+
| 25 | mg - pill - tablet - price - xtc | 1228 | Drug Sales |
|
68 |
+
| 26 | box - mail - package - address - po | 3778 | Mail Delivery Issues |
|
69 |
+
| 27 | review - thanks - thanks review - review thanks - nice review | 2829 | Positive Reviews Thank You Notes Nice |
|
70 |
+
| 28 | ticket - support ticket - support - se en - se | 1242 | Support Ticket Confusion |
|
71 |
+
| 29 | cryptonia - market - empire - nightmare - vendor | 1331 | Cryptonia Market |
|
72 |
+
| 30 | escrow - fe - use escrow - vendor - market | 1214 | Market Escrow Usage |
|
73 |
+
| 31 | onion - dot onion - dot - onion link - onion site | 1216 | Onion Links |
|
74 |
+
| 32 | det - er - og - har - jeg | 1226 | Kola;Vendor;Stealth Shipping;Review;Norway |
|
75 |
+
| 33 | tor - browser - network - javascript - tor browser | 1126 | Anonymous Browsing and Tor Networks |
|
76 |
+
| 34 | dread - reddit - post - dread dread - sub | 1109 | community appreciation |
|
77 |
+
| 35 | meth - business day - business - day - good | 1074 | Meth Vendor Quality Review |
|
78 |
+
| 36 | fent - fentanyl - opiate - heroin - nt | 1161 | Fentanyl Opiate Discussion |
|
79 |
+
| 37 | link - link link - point link comment - link point link - link comment post | 1695 | Links and Posts |
|
80 |
+
| 38 | pack - week - day - ordered - land | 1332 | Package Delay and Shipping |
|
81 |
+
| 39 | pm - interested - looking - find - please | 1499 | PM Interested Help Explanation |
|
82 |
+
| 40 | hugbunter - hugbunter hugbunter - link hugbunter - hugbunter link - link hugbunter hugbunter | 1157 | hugbunter links |
|
83 |
+
| 41 | drug - police - court - enforcement - investigation | 923 | Darknet Drug Enforcement |
|
84 |
+
| 42 | stealth - good - good stealth - vendor - shipping | 849 | Good Stealth Vendor Shipping |
|
85 |
+
| 43 | counterfeit - note - euro - bill - pen | 968 | Counterfeit Money Sales |
|
86 |
+
| 44 | empire - nightmare - empire empire - find empire - empire nightmare | 1010 | Empire Name Search |
|
87 |
+
| 45 | day - waiting - week - month - hour | 1040 | Waiting Time |
|
88 |
+
| 46 | id - passport - fake - license - scan | 1681 | Fake IDs & Documents |
|
89 |
+
| 47 | bank - account - drop - bank drop - cash | 1079 | Bank Drop Transaction |
|
90 |
+
| 48 | wickr - use wickr - using wickr - contact - via wickr | 1850 | Wickr Abuse Policy Protect Wickr Community |
|
91 |
+
| 49 | de - und - que - un - da | 751 | German Darknet Market |
|
92 |
+
| 50 | phishing - phishing link - link - phished - phishing site | 740 | Phishing Detection Techniques |
|
93 |
+
| 51 | dream - nightmare - dream dream - anyone - like | 1041 | Dream Nightmare Experience |
|
94 |
+
| 52 | price - sale - promo - sell - buy | 756 | Sale;Promotional Offers;Good Deals |
|
95 |
+
| 53 | tails - tail - usb - electrum - persistent | 1101 | Tails;Electrum;Persistent File;USB Installation |
|
96 |
+
| 54 | adderall - amphetamine - mg - replacement - speed | 1190 | Adderall Replacement Pills |
|
97 |
+
| 55 | cancel - order - auto - cancel order - day | 879 | Auto Cancel Orders |
|
98 |
+
| 56 | mushroom - shrooms - cubensis - psilocybin - spore | 963 | Mushroom Guide & Dosage Information |
|
99 |
+
| 57 | ketamine - gm - gm gm - gm gm gm - vendor | 763 | Ketamine Vendor Quality Shard |
|
100 |
+
| 58 | exit - exit scam - scam - exit scamming - exit scammed | 696 | Exit Scam Market |
|
101 |
+
| 59 | phone - burner - sim - card - number | 691 | Burner Phone Usage |
|
102 |
+
| 60 | dream - market - dream market - nightmare - nightmare market | 830 | Dream Market |
|
103 |
+
| 61 | bond - vendor bond - vendor - bond back - market | 860 | Vendor Bond Waiver Market |
|
104 |
+
| 62 | vpn - tor - use - using - proxy | 546 | VPN and Tor Use |
|
105 |
+
| 63 | jabber - telegram - xmpp - pidgin - otr | 1054 | Jabber/XMPP/OTR Chat Clients |
|
106 |
+
| 64 | dmt - psychedelics - per - psychedelic - changa | 588 | DMT Psychedelics Prices |
|
107 |
+
| 65 | captcha - captchas - page - enter - login | 697 | Captcha Issues in Darknet Market |
|
108 |
+
| 66 | sample - free sample - free - review - sample pack | 526 | Free Sample Order |
|
109 |
+
| 67 | update - issue - problem - working - fixed | 682 | Fixed Bug Issue |
|
110 |
+
| 68 | cgmc - invite - vendor - cgmc cgmc - link cgmc | 1410 | CGMC Invites |
|
111 |
+
| 69 | apollon - apollon market - market - empire - apollomarket | 483 | Apollon Market Update |
|
112 |
+
| 70 | paypal - transfer - account - paypal account - paypal transfer | 508 | PayPal Account Transfer |
|
113 |
+
| 71 | giveaway - win - number - winner - contest | 681 | Giveaways & Contests |
|
114 |
+
| 72 | pm - working link - link - link please - please | 739 | Working Link Requests |
|
115 |
+
| 73 | darkfail - link - fail - dark - dark fail | 516 | Dark Fail Links Question |
|
116 |
+
| 74 | empire - market - empire market - nightmare - alphabay | 598 | Empire Market Vendor Feedback |
|
117 |
+
| 75 | package - pack - delivery - tracking - day | 1144 | Package Delivery Tracking |
|
118 |
+
| 76 | bag - dog - seal - mylar - vac | 1816 | Smuggling Methods & Detection |
|
119 |
+
| 77 | opsec - opsec opsec - link opsec - opsec link - opsec opsec link | 542 | Opsec Guidance |
|
120 |
+
| 78 | link - working - working link - main link - link working | 392 | Link issues |
|
121 |
+
| 79 | money - pay - money back - dollar - get | 875 | Money Losses |
|
122 |
+
| 80 | tracking - tracking number - number - order - day | 513 | Tracking Number Concerns |
|
123 |
+
| 81 | guide - tutorial - outdated - thanks - method | 1078 | Guide Topic or Tutorial Help or Out |
|
124 |
+
| 82 | bir - bu - kai - ama - var | 357 | bu bir kai |
|
125 |
+
| 83 | rc - mxe - dck - rcs - fdck | 356 | RC Sources |
|
126 |
+
| 84 | cash - btc - bitcoin - coinbase - atm | 549 | Crypto Purchase Methods |
|
127 |
+
| 85 | olympus - market - fe escrow - olympus market - dream | 1271 | Olympus Market |
|
128 |
+
| 86 | log - logged - logging - login - page | 338 | Logging and Session Issues |
|
129 |
+
| 87 | vacation - vacation mode - mode - back - profile | 401 | Vacation Mode |
|
130 |
+
| 88 | message - contact - email - support - send | 331 | Message;Email Support |
|
131 |
+
| 89 | post - mod - comment - delete - thread | 748 | Moderation and Deletion of Posts |
|
132 |
+
| 90 | xmr - wallet - deposit - monero - payment id | 1369 | XMR Wallet Issue |
|
133 |
+
| 91 | image - exif - upload - exif data - data | 513 | Image Exif Data Upload |
|
134 |
+
| 92 | back - hope - welcome back - luck - good | 399 | hope recovery |
|
135 |
+
| 93 | review - template - pic - picture - table | 2240 | Review templates and images |
|
136 |
+
| 94 | cheer - cheer cheer - cheer mate - mate - anyone | 547 | Cheer Positivity Justification |
|
137 |
+
| 95 | bulk - price - kratom - good - kg | 307 | Bulk Kratom Vendors |
|
138 |
+
| 96 | wallstreet - wall st - wall - st - wallstreetmarket | 675 | Wallstreet Market Forum Links |
|
139 |
+
| 97 | product - stealth - shipping - quality - price | 379 | Product Review |
|
140 |
+
| 98 | listing - list - superlist - vendor - search | 2037 | Listing Management and Visibility |
|
141 |
+
| 99 | fuck - cunt - dick - fud fud - fud fud fud | 493 | Mom sex;Insults |
|
142 |
+
| 100 | empire - exit - market - scam - exit scam | 3076 | Empire Market |
|
143 |
+
| 101 | protonmail - protonmailcom - email - proton - secmail | 1138 | Protonmail Alternatives |
|
144 |
+
| 102 | wallet - node - gui - monero - remote node | 296 | Monero Wallet Update |
|
145 |
+
| 103 | multisig - market - transaction - escrow - use multisig | 423 | MultiSig Market Transactions |
|
146 |
+
| 104 | bunk - bunk bar - bar - hulk - sent bunk | 307 | Bunk and Bar |
|
147 |
+
| 105 | mg - benzo - benzos - alprazolam - alp | 355 | Benzodiazepine use and abuse |
|
148 |
+
| 106 | pelican - bird - pelicanvendor - bigbird - pelicanvendor pelicanvendor link | 2460 | Pelican Bird Giveaway |
|
149 |
+
| 107 | heinekenexpress - link heinekenexpress - heinekenexpress link - heinekenexpress heinekenexpress - link heinekenexpress heinekenexpress | 249 | Heineken Express Reviews |
|
150 |
+
| 108 | rdp - sock - vpn - ip - card | 266 | RDP Socks for Carding |
|
151 |
+
| 109 | dnm - dm - dread - forum - link | 401 | DNM Reddit Subs |
|
152 |
+
| 110 | pic - picture - photo - photoshop - post pic | 598 | Pics and posts |
|
153 |
+
| 111 | empire - link - empiremarket - empire link - link empire | 476 | Empire Market Links |
|
154 |
+
| 112 | invite - invite code - code - need invite - get invite | 398 | Darknet Market Invites |
|
155 |
+
| 113 | samsara - market - samsara market - sam - dream | 224 | Samsara Market |
|
156 |
+
| 114 | chemical - test - lab - powder - product | 290 | Chemistry Research and Supply |
|
157 |
+
| 115 | rapture - rapture market - rapturemarket - market - gbp | 1064 | Rapture Market GBP |
|
158 |
+
| 116 | water - acetone - powder - dry - ml | 214 | Acetone Recrystallization Techniques |
|
159 |
+
| 117 | witchman - link witchman - link - witchman link - witchman witchman | 1754 | Link Witchman Discussion |
|
160 |
+
| 118 | tochka - market - tochka market - tochka tochka - use tochka | 218 | Tochka market |
|
161 |
+
| 119 | post - know guy know - guy know guy - know guy - guy know | 210 | Read Post Discussion |
|
162 |
+
| 120 | subdread - sub - post - subdreads - create | 2589 | Subdread creation issues |
|
163 |
+
|
164 |
+
</details>
|
165 |
+
|
166 |
+
## Training hyperparameters
|
167 |
+
|
168 |
+
* calculate_probabilities: True
|
169 |
+
* language: None
|
170 |
+
* low_memory: True
|
171 |
+
* min_topic_size: 10
|
172 |
+
* n_gram_range: (1, 3)
|
173 |
+
* nr_topics: None
|
174 |
+
* seed_topic_list: None
|
175 |
+
* top_n_words: 10
|
176 |
+
* verbose: True
|
177 |
+
* zeroshot_min_similarity: 0.7
|
178 |
+
* zeroshot_topic_list: None
|
179 |
+
|
180 |
+
## Framework versions
|
181 |
+
|
182 |
+
* Numpy: 1.26.4
|
183 |
+
* HDBSCAN: 0.8.36
|
184 |
+
* UMAP: 0.5.6
|
185 |
+
* Pandas: 2.2.1
|
186 |
+
* Scikit-Learn: 1.4.1.post1
|
187 |
+
* Sentence-transformers: 3.0.1
|
188 |
+
* Transformers: 4.39.3
|
189 |
+
* Numba: 0.60.0
|
190 |
+
* Plotly: 5.22.0
|
191 |
+
* Python: 3.12.2
|
config.json
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"calculate_probabilities": true,
|
3 |
+
"language": null,
|
4 |
+
"low_memory": true,
|
5 |
+
"min_topic_size": 10,
|
6 |
+
"n_gram_range": [
|
7 |
+
1,
|
8 |
+
3
|
9 |
+
],
|
10 |
+
"nr_topics": null,
|
11 |
+
"seed_topic_list": null,
|
12 |
+
"top_n_words": 10,
|
13 |
+
"verbose": true,
|
14 |
+
"zeroshot_min_similarity": 0.7,
|
15 |
+
"zeroshot_topic_list": null,
|
16 |
+
"embedding_model": "all-MiniLM-L6-v2"
|
17 |
+
}
|
ctfidf.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d18e02fc04cddb360f2a3d21ee0332ad68496672ad3f31ba2227e8aaff1218b8
|
3 |
+
size 233761760
|
ctfidf_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a152061128403e7f7531bd3d07610aa71e55320742a7b1659e5bace1c04a7bd4
|
3 |
+
size 371499932
|
topic_embeddings.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e6bee9de4fd6336f84b3d8751801c5278fde32cb185da4168479e52a0b7b5fca
|
3 |
+
size 187480
|
topics.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|