thomasrenault commited on
Commit
66c02c5
·
verified ·
1 Parent(s): 8223df6

Upload folder using huggingface_hub

Browse files
.DS_Store ADDED
Binary file (6.15 kB). View file
 
README.md CHANGED
@@ -1,20 +1,23 @@
1
  ---
 
2
  license: mit
3
- datasets:
4
- - thomasrenault/us_tweet_speech_congress
5
- language:
6
- - en
7
  tags:
8
- - text-classification
9
- - multi-label-classification
10
- - topic-classification
11
- - political-text
12
- - tweets
13
- - distilbert
14
- - active-learning
 
 
 
 
 
15
  pipeline_tag: text-classification
16
  ---
17
 
 
18
 
19
  A multi-label political topic classifier fine-tuned on US political tweets and congressional speeches.
20
  Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
@@ -34,6 +37,7 @@ A document can belong to **zero or multiple topics simultaneously**.
34
  | `tax and inequality` | Tax policy, economic inequality, redistribution |
35
  | `trade` | Trade policy, tariffs, international commerce |
36
 
 
37
 
38
  ## Training
39
 
@@ -42,7 +46,7 @@ A document can belong to **zero or multiple topics simultaneously**.
42
  | Base model | `distilbert-base-uncased` |
43
  | Architecture | `DistilBertForSequenceClassification` (multi-label) |
44
  | Problem type | `multi_label_classification` |
45
- | Training data | ~100,000 labeled documents (early checkpoint) |
46
  | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
47
  | Strategy | Active learning (uncertainty sampling) |
48
  | Seed size | 1,000 documents (random) |
@@ -102,9 +106,11 @@ print(predict("Tax cuts for the wealthy only increase inequality in America."))
102
  If you use this model, please cite:
103
 
104
  ```
105
- @article{algan2026emotions,
106
- title={Emotions and policy views},
107
- author={Algan, Y, Davoine, E., Renault, T., and Stantcheva, S},
108
- year={2026}
 
 
109
  }
110
- ```
 
1
  ---
2
+ language: en
3
  license: mit
 
 
 
 
4
  tags:
5
+ - text-classification
6
+ - multi-label-classification
7
+ - topic-classification
8
+ - political-text
9
+ - tweets
10
+ - distilbert
11
+ - active-learning
12
+ datasets:
13
+ - thomasrenault/us_tweet_speech_congress
14
+ metrics:
15
+ - f1
16
+ base_model: distilbert-base-uncased
17
  pipeline_tag: text-classification
18
  ---
19
 
20
+ # thomasrenault/topic
21
 
22
  A multi-label political topic classifier fine-tuned on US political tweets and congressional speeches.
23
  Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
 
37
  | `tax and inequality` | Tax policy, economic inequality, redistribution |
38
  | `trade` | Trade policy, tariffs, international commerce |
39
 
40
+ Documents that match none of the above are implicitly classified as `other topic`.
41
 
42
  ## Training
43
 
 
46
  | Base model | `distilbert-base-uncased` |
47
  | Architecture | `DistilBertForSequenceClassification` (multi-label) |
48
  | Problem type | `multi_label_classification` |
49
+ | Training data | ~200,000 labeled documents |
50
  | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
51
  | Strategy | Active learning (uncertainty sampling) |
52
  | Seed size | 1,000 documents (random) |
 
106
  If you use this model, please cite:
107
 
108
  ```
109
+ @misc{renault2025topic,
110
+ author = {Renault, Thomas},
111
+ title = {thomasrenault/topic: Multi-label political topic classifier for US political text},
112
+ year = {2025},
113
+ publisher = {HuggingFace},
114
+ url = {https://huggingface.co/thomasrenault/topic}
115
  }
116
+ ```
checkpoint/batches/batch_69ce563236708190b9830d55e5ed5e1b.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint/labeled_data.csv CHANGED
The diff for this file is too large to render. See raw diff
 
checkpoint/state.json CHANGED
@@ -1 +1 @@
1
- {"completed_round": 0, "labeled_count": 1000}
 
1
+ {"completed_round": 0, "labeled_count": 10000}
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cec13faef4fb96adae4fae15034ee2e1d4e3fab52c8dc87b4292a0df816e30df
3
  size 267847948
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86130f7904d62adcaf21d36a9bb41ff682d99a2a90af3950604ad144701b78cd
3
  size 267847948