Upload folder using huggingface_hub
Browse files- .DS_Store +0 -0
- .gitattributes +1 -0
- README.md +3 -10
- labeled_data.csv +3 -0
- model.safetensors +1 -1
.DS_Store
CHANGED
|
Binary files a/.DS_Store and b/.DS_Store differ
|
|
|
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
labeled_data.csv filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -8,7 +8,6 @@ tags:
|
|
| 8 |
- political-text
|
| 9 |
- tweets
|
| 10 |
- distilbert
|
| 11 |
-
- active-learning
|
| 12 |
datasets:
|
| 13 |
- thomasrenault/us_tweet_speech_congress
|
| 14 |
metrics:
|
|
@@ -19,8 +18,7 @@ pipeline_tag: text-classification
|
|
| 19 |
|
| 20 |
# thomasrenault/topic
|
| 21 |
|
| 22 |
-
A multi-label political topic classifier fine-tuned on US
|
| 23 |
-
Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
|
| 24 |
|
| 25 |
## Labels
|
| 26 |
|
|
@@ -48,16 +46,12 @@ Documents that match none of the above are implicitly classified as `other topic
|
|
| 48 |
| Problem type | `multi_label_classification` |
|
| 49 |
| Training data | ~200,000 labeled documents |
|
| 50 |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
|
| 51 |
-
|
|
| 52 |
-
| Seed size | 1,000 documents (random) |
|
| 53 |
-
| AL query size | 25,000 documents / round |
|
| 54 |
-
| Epochs (seed) | 4 |
|
| 55 |
-
| Epochs (AL) | 2 (warm-start) |
|
| 56 |
| Learning rate | 2e-5 |
|
| 57 |
| Batch size | 16 |
|
| 58 |
| Max length | 512 tokens |
|
| 59 |
| Classification threshold | 0.5 |
|
| 60 |
-
| Domain | US
|
| 61 |
|
| 62 |
## Usage
|
| 63 |
|
|
@@ -98,7 +92,6 @@ print(predict("Tax cuts for the wealthy only increase inequality in America."))
|
|
| 98 |
|
| 99 |
- Trained on **US English political text** — may not generalise to other political systems or languages
|
| 100 |
- Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
|
| 101 |
-
- Early training checkpoint (round 0, ~1,600 documents) — performance will improve as active learning progresses
|
| 102 |
- Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope
|
| 103 |
|
| 104 |
## Citation
|
|
|
|
| 8 |
- political-text
|
| 9 |
- tweets
|
| 10 |
- distilbert
|
|
|
|
| 11 |
datasets:
|
| 12 |
- thomasrenault/us_tweet_speech_congress
|
| 13 |
metrics:
|
|
|
|
| 18 |
|
| 19 |
# thomasrenault/topic
|
| 20 |
|
| 21 |
+
A multi-label political topic classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.
|
|
|
|
| 22 |
|
| 23 |
## Labels
|
| 24 |
|
|
|
|
| 46 |
| Problem type | `multi_label_classification` |
|
| 47 |
| Training data | ~200,000 labeled documents |
|
| 48 |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
|
| 49 |
+
| Epochs | 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
| Learning rate | 2e-5 |
|
| 51 |
| Batch size | 16 |
|
| 52 |
| Max length | 512 tokens |
|
| 53 |
| Classification threshold | 0.5 |
|
| 54 |
+
| Domain | US tweets about policy, campaign speeches and congressional floor speeches |
|
| 55 |
|
| 56 |
## Usage
|
| 57 |
|
|
|
|
| 92 |
|
| 93 |
- Trained on **US English political text** — may not generalise to other political systems or languages
|
| 94 |
- Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
|
|
|
|
| 95 |
- Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope
|
| 96 |
|
| 97 |
## Citation
|
labeled_data.csv
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:164a5271e1c198e2bd1925ef6b771d55caf4372e4eaac4a9b43be66d4cf567b1
|
| 3 |
+
size 61838801
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 267847948
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:08965f0c6c2ebe5ed5d283d165761331fc83e9e30b0471b6489eb3071d760499
|
| 3 |
size 267847948
|