cornelius commited on
Commit
4e8c48d
·
1 Parent(s): 7760b92

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -7
README.md CHANGED
@@ -27,27 +27,39 @@ Fine-tuned model in seven languages on texts from nine countries, based on [bert
27
 
28
  ## Model description
29
 
30
- tbs
 
31
 
32
  ## Model variations
33
 
34
- tbd (monolingual)
35
 
36
  ## Intended uses & limitations
37
 
38
- tbd
39
 
40
  ### How to use
41
 
42
- tbd
 
 
 
 
 
 
 
43
 
44
  ### Limitations and bias
45
 
46
- tbd
 
 
47
 
48
  ## Training data
49
 
50
- For the training data, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
 
 
51
 
52
  ## Training procedure
53
 
@@ -59,10 +71,16 @@ For the preprocessing, please refer to [bert-base-multilingual-cased](https://hu
59
 
60
  For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
61
 
 
 
 
62
  ## Evaluation results
63
 
64
- Fine-tuned on our downstream task, this model achieves the following results:
65
 
 
 
 
66
 
67
  ### BibTeX entry and citation info
68
 
 
27
 
28
  ## Model description
29
 
30
+ The PARTYPRESS multilingual model builds on [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP).
31
+
32
 
33
  ## Model variations
34
 
35
+ We plan to release monolingual models for each of the languages covered by this multilingual model.
36
 
37
  ## Intended uses & limitations
38
 
39
+ The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
40
 
41
  ### How to use
42
 
43
+ This model can be used directly with a pipeline for text classification:
44
+
45
+ ```python
46
+ >>> from transformers import pipeline
47
+ >>> partypress = pipeline("text-classification", model = "cornelius/partypress-multilingual", tokenizer = "cornelius/partypress-multilingual")
48
+ >>> partypress("We urgently need to fight climate change and reduce carbon emissions. This is what our party stands for.")
49
+
50
+ ```
51
 
52
  ### Limitations and bias
53
 
54
+ The model was trained with data from parties in nine countries. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
55
+
56
+ The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database.
57
 
58
  ## Training data
59
 
60
+ The PARTYPRESS multilingual model was fine-tuned with 27,243 press releases in seven languages on texts from 68 European parties in nine countries. The press releases were labeled by two expert human coders per country.
61
+
62
+ For the training data of the underlying model, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
63
 
64
  ## Training procedure
65
 
 
71
 
72
  For the pretraining, please refer to [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
73
 
74
+ ### Fine-tuning
75
+
76
+
77
  ## Evaluation results
78
 
79
+ Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation:
80
 
81
+ | Accuracy | Precision | Recall | F1 score |
82
+ |:--------:|:---------:|:-------:|:--------:|
83
+ | 69.52 | 67.99 | 67.60 | 66.77 |
84
 
85
  ### BibTeX entry and citation info
86