thomasrenault commited on
Commit
ae6472e
·
verified ·
1 Parent(s): 66c02c5

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. .DS_Store +0 -0
  2. .gitattributes +1 -0
  3. README.md +3 -10
  4. labeled_data.csv +3 -0
  5. model.safetensors +1 -1
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ labeled_data.csv filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -8,7 +8,6 @@ tags:
8
  - political-text
9
  - tweets
10
  - distilbert
11
- - active-learning
12
  datasets:
13
  - thomasrenault/us_tweet_speech_congress
14
  metrics:
@@ -19,8 +18,7 @@ pipeline_tag: text-classification
19
 
20
  # thomasrenault/topic
21
 
22
- A multi-label political topic classifier fine-tuned on US political tweets and congressional speeches.
23
- Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
24
 
25
  ## Labels
26
 
@@ -48,16 +46,12 @@ Documents that match none of the above are implicitly classified as `other topic
48
  | Problem type | `multi_label_classification` |
49
  | Training data | ~200,000 labeled documents |
50
  | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
51
- | Strategy | Active learning (uncertainty sampling) |
52
- | Seed size | 1,000 documents (random) |
53
- | AL query size | 25,000 documents / round |
54
- | Epochs (seed) | 4 |
55
- | Epochs (AL) | 2 (warm-start) |
56
  | Learning rate | 2e-5 |
57
  | Batch size | 16 |
58
  | Max length | 512 tokens |
59
  | Classification threshold | 0.5 |
60
- | Domain | US political tweets and congressional floor speeches |
61
 
62
  ## Usage
63
 
@@ -98,7 +92,6 @@ print(predict("Tax cuts for the wealthy only increase inequality in America."))
98
 
99
  - Trained on **US English political text** — may not generalise to other political systems or languages
100
  - Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
101
- - Early training checkpoint (round 0, ~1,600 documents) — performance will improve as active learning progresses
102
  - Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope
103
 
104
  ## Citation
 
8
  - political-text
9
  - tweets
10
  - distilbert
 
11
  datasets:
12
  - thomasrenault/us_tweet_speech_congress
13
  metrics:
 
18
 
19
  # thomasrenault/topic
20
 
21
+ A multi-label political topic classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.
 
22
 
23
  ## Labels
24
 
 
46
  | Problem type | `multi_label_classification` |
47
  | Training data | ~200,000 labeled documents |
48
  | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
49
+ | Epochs | 4 |
 
 
 
 
50
  | Learning rate | 2e-5 |
51
  | Batch size | 16 |
52
  | Max length | 512 tokens |
53
  | Classification threshold | 0.5 |
54
+ | Domain | US tweets about policy, campaign speeches and congressional floor speeches |
55
 
56
  ## Usage
57
 
 
92
 
93
  - Trained on **US English political text** — may not generalise to other political systems or languages
94
  - Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
 
95
  - Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope
96
 
97
  ## Citation
labeled_data.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:164a5271e1c198e2bd1925ef6b771d55caf4372e4eaac4a9b43be66d4cf567b1
3
+ size 61838801
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:86130f7904d62adcaf21d36a9bb41ff682d99a2a90af3950604ad144701b78cd
3
  size 267847948
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08965f0c6c2ebe5ed5d283d165761331fc83e9e30b0471b6489eb3071d760499
3
  size 267847948