Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

.DS_Store +0 -0
.gitattributes +1 -0
README.md +3 -10
labeled_data.csv +3 -0
model.safetensors +1 -1

.DS_Store CHANGED Viewed

Binary files a/.DS_Store and b/.DS_Store differ

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+labeled_data.csv filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -8,7 +8,6 @@ tags:
   - political-text
   - tweets
   - distilbert
-  - active-learning
 datasets:
   - thomasrenault/us_tweet_speech_congress
 metrics:
@@ -19,8 +18,7 @@ pipeline_tag: text-classification
 # thomasrenault/topic
-A multi-label political topic classifier fine-tuned on US political tweets and congressional speeches.
-Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
 ## Labels
@@ -48,16 +46,12 @@ Documents that match none of the above are implicitly classified as `other topic
 | Problem type | `multi_label_classification` |
 | Training data | ~200,000 labeled documents |
 | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
-| Strategy | Active learning (uncertainty sampling) |
-| Seed size | 1,000 documents (random) |
-| AL query size | 25,000 documents / round |
-| Epochs (seed) | 4 |
-| Epochs (AL) | 2 (warm-start) |
 | Learning rate | 2e-5 |
 | Batch size | 16 |
 | Max length | 512 tokens |
 | Classification threshold | 0.5 |
-| Domain | US political tweets and congressional floor speeches |
 ## Usage
@@ -98,7 +92,6 @@ print(predict("Tax cuts for the wealthy only increase inequality in America."))
 - Trained on **US English political text** — may not generalise to other political systems or languages
 - Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
-- Early training checkpoint (round 0, ~1,600 documents) — performance will improve as active learning progresses
 - Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope
 ## Citation

   - political-text
   - tweets
   - distilbert
 datasets:
   - thomasrenault/us_tweet_speech_congress
 metrics:
 # thomasrenault/topic
+A multi-label political topic classifier fine-tuned on US tweets, campaign speeches and congressional speeches.  Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.
 ## Labels
 | Problem type | `multi_label_classification` |
 | Training data | ~200,000 labeled documents |
 | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
+| Epochs | 4 |
 | Learning rate | 2e-5 |
 | Batch size | 16 |
 | Max length | 512 tokens |
 | Classification threshold | 0.5 |
+| Domain | US tweets about policy, campaign speeches and congressional floor speeches |
 ## Usage
 - Trained on **US English political text** — may not generalise to other political systems or languages
 - Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
 - Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope
 ## Citation

labeled_data.csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:164a5271e1c198e2bd1925ef6b771d55caf4372e4eaac4a9b43be66d4cf567b1
+size 61838801

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:86130f7904d62adcaf21d36a9bb41ff682d99a2a90af3950604ad144701b78cd
 size 267847948

 version https://git-lfs.github.com/spec/v1
+oid sha256:08965f0c6c2ebe5ed5d283d165761331fc83e9e30b0471b6489eb3071d760499
 size 267847948