Update README.md
Browse files
README.md
CHANGED
@@ -27,27 +27,39 @@ Fine-tuned model in seven languages on texts from nine countries, based on [bert
|
|
27 |
|
28 |
## Model description
|
29 |
|
30 |
-
|
|
|
31 |
|
32 |
## Model variations
|
33 |
|
34 |
-
|
35 |
|
36 |
## Intended uses & limitations
|
37 |
|
38 |
-
|
39 |
|
40 |
### How to use
|
41 |
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
### Limitations and bias
|
45 |
|
46 |
-
|
|
|
|
|
47 |
|
48 |
## Training data
|
49 |
|
50 |
-
|
|
|
|
|
51 |
|
52 |
## Training procedure
|
53 |
|
@@ -59,10 +71,16 @@ For the preprocessing, please refer to [bert-base-multilingual-cased](https://hu
|
|
59 |
|
60 |
For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
|
61 |
|
|
|
|
|
|
|
62 |
## Evaluation results
|
63 |
|
64 |
-
Fine-tuned on our downstream task, this model achieves the following results:
|
65 |
|
|
|
|
|
|
|
66 |
|
67 |
### BibTeX entry and citation info
|
68 |
|
|
|
27 |
|
28 |
## Model description
|
29 |
|
30 |
+
The PARTYPRESS multilingual model builds on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP).
|
31 |
+
|
32 |
|
33 |
## Model variations
|
34 |
|
35 |
+
We plan to release monolingual models for each of the languages covered by this multilingual model.
|
36 |
|
37 |
## Intended uses & limitations
|
38 |
|
39 |
+
The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
|
40 |
|
41 |
### How to use
|
42 |
|
43 |
+
This model can be used directly with a pipeline for text classification:
|
44 |
+
|
45 |
+
```python
|
46 |
+
>>> from transformers import pipeline
|
47 |
+
>>> partypress = pipeline("text-classification", model = "cornelius/partypress-multilingual", tokenizer = "cornelius/partypress-multilingual")
|
48 |
+
>>> partypress("We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.")
|
49 |
+
|
50 |
+
```
|
51 |
|
52 |
### Limitations and bias
|
53 |
|
54 |
+
The model was trained with data from parties in nine countries. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
|
55 |
+
|
56 |
+
The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database.
|
57 |
|
58 |
## Training data
|
59 |
|
60 |
+
The PARTYPRESS multilingual model was fine-tuned with 27,243 press releases in seven languages on texts from 68 European parties in nine countries. The press releases were labeled by two expert human coders per country.
|
61 |
+
|
62 |
+
For the training data of the underlying model, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
|
63 |
|
64 |
## Training procedure
|
65 |
|
|
|
71 |
|
72 |
For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
|
73 |
|
74 |
+
### Fine-tuning
|
75 |
+
|
76 |
+
|
77 |
## Evaluation results
|
78 |
|
79 |
+
Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation:
|
80 |
|
81 |
+
| Accuracy | Precision | Recall | F1 score |
|
82 |
+
|:--------:|:---------:|:-------:|:--------:|
|
83 |
+
| 69.52 | 67.99 | 67.60 | 66.77 |
|
84 |
|
85 |
### BibTeX entry and citation info
|
86 |
|