Create readme.txt
Browse files- readme.txt +29 -0
readme.txt
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Epstein Semantic Explorer β HuggingFace Edition
|
| 2 |
+
|
| 3 |
+
A lightweight, dependency-free semantic analysis tool for the
|
| 4 |
+
recently released House Oversight Epstein documents.
|
| 5 |
+
|
| 6 |
+
### π Features
|
| 7 |
+
- Cluster browser
|
| 8 |
+
- BM25 semantic search
|
| 9 |
+
- Topic terms
|
| 10 |
+
- Entity frequency explorer
|
| 11 |
+
- Cross-cluster entity mapping
|
| 12 |
+
- Works entirely on CPU
|
| 13 |
+
|
| 14 |
+
### π How to Use
|
| 15 |
+
1. Upload your dataset: `epstein_semantic.jsonl`
|
| 16 |
+
2. Select a cluster
|
| 17 |
+
3. Search keywords
|
| 18 |
+
4. Explore entities / topics
|
| 19 |
+
|
| 20 |
+
### π¦ Dataset Source
|
| 21 |
+
Full processed dataset is available on Kaggle:
|
| 22 |
+
https://www.kaggle.com/datasets/cjc0013/epstein-bge-large-hdbscan-bm25/data
|
| 23 |
+
|
| 24 |
+
### π Notes
|
| 25 |
+
- No external API calls
|
| 26 |
+
- No model downloads
|
| 27 |
+
- 100% local execution inside the Space
|
| 28 |
+
|
| 29 |
+
MIT License Β© cjc0013
|