cornelius commited on
Commit
9c08227
1 Parent(s): 7c1506a

Upload TFRobertaForSequenceClassification

Browse files
Files changed (3) hide show
  1. README.md +19 -121
  2. config.json +2 -2
  3. tf_model.h5 +2 -2
README.md CHANGED
@@ -1,150 +1,48 @@
1
  ---
2
  license: cc-by-sa-4.0
3
- language:
4
- - nl
5
- metrics:
6
- - accuracy
7
- pipeline_tag: text-classification
8
  tags:
9
- - partypress
10
- - political science
11
- - parties
12
- - press releases
13
- widget:
14
- - text: 'Het handelsverdrag tussen de Europese Unie en de VS moet ter goedkeuring voorgelegd worden aan de Tweede Kamer. Een motie van GroenLinks-Tweede Kamerlid Jesse Klaver werd vandaag aangenomen om te garanderen dat het handelsverdrag alleen in werking kan treden nadat het parlement zich positief heeft uitgesproken. Klaver: “Dit handelsverdrag kan gevolgen hebben voor Europese én Nederlandse regels op het gebied van milieu, voedselveiligheid, consumentenbescherming en privacy. Het is daarom belangrijk dat wij ons hier als parlement democratisch over kunnen uitspreken.”Tot nu toe bestond de mogelijkheid nog dat het verdrag zonder goedkeuring van nationale parlementen in werking zou treden. Als de Europese Commissie namelijk zou vaststellen dat het gaat om een ‘EU-only’-akkoord en geen ‘gemengd akkoord’, zou het verdrag alleen aan het Europees Parlement hoeven worden voorgelegd. Een dubbele parlementaire goedkeuringsprocedure vergroot de democratische controle.'
15
  ---
16
 
17
- # PARTYPRESS monolingual Netherlands
 
18
 
 
19
 
20
- Fine-tuned model, based on [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base). Used in Erfort et al. (2023), building on the PARTYPRESS database. For the downstream task of classyfing press releases from political parties into 23 unique policy areas we achieve a performance comparable to expert human coders.
 
21
 
22
 
23
  ## Model description
24
 
25
- The PARTYPRESS monolingual model builds on [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP):
26
- | Code | Issue |
27
- |--|-------|
28
- | 1 | Macroeconomics |
29
- | 2 | Civil Rights |
30
- | 3 | Health |
31
- | 4 | Agriculture |
32
- | 5 | Labor |
33
- | 6 | Education |
34
- | 7 | Environment |
35
- | 8 | Energy |
36
- | 9 | Immigration |
37
- | 10 | Transportation |
38
- | 12 | Law and Crime |
39
- | 13 | Social Welfare |
40
- | 14 | Housing |
41
- | 15 | Domestic Commerce |
42
- | 16 | Defense |
43
- | 17 | Technology |
44
- | 18 | Foreign Trade |
45
- | 19.1 | International Affairs |
46
- | 19.2 | European Union |
47
- | 20 | Government Operations |
48
- | 23 | Culture |
49
- | 98 | Non-thematic |
50
- | 99 | Other |
51
-
52
- ## Model variations
53
-
54
- There are several monolingual models for different countries, and a multilingual model. The multilingual model can be easily extended to other languages, country contexts, or time periods by fine-tuning it with minimal additional labeled texts.
55
 
56
  ## Intended uses & limitations
57
 
58
- The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
59
 
60
- The classification can then be used to measure which issues parties are discussing in their communication.
61
 
62
- ### How to use
63
-
64
- This model can be used directly with a pipeline for text classification:
65
-
66
- ```python
67
- >>> from transformers import pipeline
68
- >>> tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
69
- >>> partypress = pipeline("text-classification", model = "cornelius/partypress-monolingual-netherlands", tokenizer = "cornelius/partypress-monolingual-netherlands", **tokenizer_kwargs)
70
- >>> partypress("Your text here.")
71
- ```
72
-
73
- ### Limitations and bias
74
-
75
- The model was trained with data from parties in the Netherlands. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
76
-
77
- The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database. For example, the performance is highest for press releases from Ireland (75%) and lowest for Poland (55%).
78
-
79
- ## Training data
80
-
81
- The PARTYPRESS multilingual model was fine-tuned with about 3,000 press releases from parties in the Netherlands. The press releases were labeled by two expert human coders.
82
-
83
- For the training data of the underlying model, please refer to [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base)
84
 
85
  ## Training procedure
86
 
87
- ### Preprocessing
88
 
89
- For the preprocessing, please refer to [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base)
 
 
90
 
91
- ### Pretraining
92
 
93
- For the pretraining, please refer to [pdelobelle/robbert-v2-dutch-base](https://huggingface.co/pdelobelle/robbert-v2-dutch-base)
94
 
95
- ### Fine-tuning
96
 
97
- We fine-tuned the model using about 3,000 labeled press releases from political parties in theNetherlands.
98
-
99
- #### Training Hyperparameters
100
-
101
- The batch size for training was 12, for testing 2, with four epochs. All other hyperparameters were the standard from the transformers library.
102
-
103
-
104
- #### Framework versions
105
 
106
  - Transformers 4.28.0
107
  - TensorFlow 2.12.0
108
  - Datasets 2.12.0
109
  - Tokenizers 0.13.3
110
-
111
-
112
- ## Evaluation results
113
-
114
- Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation that are comparable to the performance of our expert human coders. Please refer to Erfort et al. (2023)
115
-
116
- ### BibTeX entry and citation info
117
-
118
- ```bibtex
119
- @article{erfort_partypress_2023,
120
- author = {Cornelius Erfort and
121
- Lukas F. Stoetzer and
122
- Heike Klüver},
123
- title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
124
- journal = {Research and Politics},
125
- volume = {forthcoming},
126
- year = {2023},
127
- }
128
- ```
129
-
130
- ### Further resources
131
-
132
- Github: [cornelius-erfort/partypress](https://github.com/cornelius-erfort/partypress)
133
-
134
- Research and Politics Dataverse: [Replication Data for: The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FOINX7Q)
135
-
136
-
137
-
138
- ## Acknowledgements
139
-
140
- Research for this contribution is part of the Cluster of Excellence "Contestations of the Liberal Script" (EXC 2055, Project-ID: 390715649), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Netherlands´s Excellence Strategy. Cornelius Erfort is moreover grateful for generous funding provided by the DFG through the Research Training Group DYNAMICS (GRK 2458/1).
141
-
142
- ## Contact
143
-
144
- Cornelius Erfort
145
-
146
- Humboldt-Universität zu Berlin
147
-
148
- [corneliuserfort.de](corneliuserfort.de)
149
-
150
-
 
1
  ---
2
  license: cc-by-sa-4.0
 
 
 
 
 
3
  tags:
4
+ - generated_from_keras_callback
5
+ model-index:
6
+ - name: partypress-monolingual-netherlands
7
+ results: []
 
 
8
  ---
9
 
10
+ <!-- This model card has been generated automatically according to the information Keras had access to. You should
11
+ probably proofread and complete it, then remove this comment. -->
12
 
13
+ # partypress-monolingual-netherlands
14
 
15
+ This model is a fine-tuned version of [cornelius/partypress-monolingual-netherlands](https://huggingface.co/cornelius/partypress-monolingual-netherlands) on an unknown dataset.
16
+ It achieves the following results on the evaluation set:
17
 
18
 
19
  ## Model description
20
 
21
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ## Intended uses & limitations
24
 
25
+ More information needed
26
 
27
+ ## Training and evaluation data
28
 
29
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training procedure
32
 
33
+ ### Training hyperparameters
34
 
35
+ The following hyperparameters were used during training:
36
+ - optimizer: None
37
+ - training_precision: float32
38
 
39
+ ### Training results
40
 
 
41
 
 
42
 
43
+ ### Framework versions
 
 
 
 
 
 
 
44
 
45
  - Transformers 4.28.0
46
  - TensorFlow 2.12.0
47
  - Datasets 2.12.0
48
  - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_name_or_path": "cornelius/partypress-monolingual-netherlands",
3
  "architectures": [
4
- "BertForSequenceClassification"
5
  ],
6
  "attention_probs_dropout_prob": 0.1,
7
  "bos_token_id": 0,
@@ -65,7 +65,7 @@
65
  },
66
  "layer_norm_eps": 1e-05,
67
  "max_position_embeddings": 514,
68
- "model_type": "bert",
69
  "num_attention_heads": 12,
70
  "num_hidden_layers": 12,
71
  "output_past": true,
 
1
  {
2
  "_name_or_path": "cornelius/partypress-monolingual-netherlands",
3
  "architectures": [
4
+ "RobertaForSequenceClassification"
5
  ],
6
  "attention_probs_dropout_prob": 0.1,
7
  "bos_token_id": 0,
 
65
  },
66
  "layer_norm_eps": 1e-05,
67
  "max_position_embeddings": 514,
68
+ "model_type": "roberta",
69
  "num_attention_heads": 12,
70
  "num_hidden_layers": 12,
71
  "output_past": true,
tf_model.h5 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d02ec2cb4a85ef548a636f0d621c9eb6654c86c02a0401aa7231f0e1dcd4ec70
3
- size 467407212
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57904d84a0f30c3f56262675e349c9841f64eef3fb9d518b35c4ad5bed25e5c1
3
+ size 467408704