Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.DS_Store +0 -0
README.md +23 -17
checkpoint/batches/batch_69ce563236708190b9830d55e5ed5e1b.json +0 -0
checkpoint/labeled_data.csv +0 -0
checkpoint/state.json +1 -1
model.safetensors +1 -1

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

README.md CHANGED Viewed

@@ -1,20 +1,23 @@
 ---
 license: mit
-datasets:
-- thomasrenault/us_tweet_speech_congress
-language:
-- en
 tags:
-- text-classification
-- multi-label-classification
-- topic-classification
-- political-text
-- tweets
-- distilbert
-- active-learning
 pipeline_tag: text-classification
 ---
 A multi-label political topic classifier fine-tuned on US political tweets and congressional speeches.
 Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
@@ -34,6 +37,7 @@ A document can belong to **zero or multiple topics simultaneously**.
 | `tax and inequality` | Tax policy, economic inequality, redistribution |
 | `trade` | Trade policy, tariffs, international commerce |
 ## Training
@@ -42,7 +46,7 @@ A document can belong to **zero or multiple topics simultaneously**.
 | Base model | `distilbert-base-uncased` |
 | Architecture | `DistilBertForSequenceClassification` (multi-label) |
 | Problem type | `multi_label_classification` |
-| Training data | ~100,000 labeled documents (early checkpoint) |
 | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
 | Strategy | Active learning (uncertainty sampling) |
 | Seed size | 1,000 documents (random) |
@@ -102,9 +106,11 @@ print(predict("Tax cuts for the wealthy only increase inequality in America."))
 If you use this model, please cite:
 ```
-@article{algan2026emotions,
-  title={Emotions and policy views},
-  author={Algan, Y, Davoine, E., Renault, T., and Stantcheva, S},
-  year={2026}
 }
-```

 ---
+language: en
 license: mit
 tags:
+  - text-classification
+  - multi-label-classification
+  - topic-classification
+  - political-text
+  - tweets
+  - distilbert
+  - active-learning
+datasets:
+  - thomasrenault/us_tweet_speech_congress
+metrics:
+  - f1
+base_model: distilbert-base-uncased
 pipeline_tag: text-classification
 ---
+# thomasrenault/topic
 A multi-label political topic classifier fine-tuned on US political tweets and congressional speeches.
 Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
 | `tax and inequality` | Tax policy, economic inequality, redistribution |
 | `trade` | Trade policy, tariffs, international commerce |
+Documents that match none of the above are implicitly classified as `other topic`.
 ## Training
 | Base model | `distilbert-base-uncased` |
 | Architecture | `DistilBertForSequenceClassification` (multi-label) |
 | Problem type | `multi_label_classification` |
+| Training data | ~200,000 labeled documents |
 | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
 | Strategy | Active learning (uncertainty sampling) |
 | Seed size | 1,000 documents (random) |
 If you use this model, please cite:
 ```
+@misc{renault2025topic,
+  author    = {Renault, Thomas},
+  title     = {thomasrenault/topic: Multi-label political topic classifier for US political text},
+  year      = {2025},
+  publisher = {HuggingFace},
+  url       = {https://huggingface.co/thomasrenault/topic}
 }
+```

checkpoint/batches/batch_69ce563236708190b9830d55e5ed5e1b.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint/labeled_data.csv CHANGED Viewed

The diff for this file is too large to render. See raw diff

checkpoint/state.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"completed_round": 0, "labeled_count": ~~1000~~}


1	+ {"completed_round": 0, "labeled_count": 10000}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cec13faef4fb96adae4fae15034ee2e1d4e3fab52c8dc87b4292a0df816e30df
 size 267847948

 version https://git-lfs.github.com/spec/v1
+oid sha256:86130f7904d62adcaf21d36a9bb41ff682d99a2a90af3950604ad144701b78cd
 size 267847948