File size: 4,162 Bytes
d38645b
 
 
 
 
 
14993c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1520cf5
14993c5
8b34c46
 
 
 
 
 
14993c5
 
 
1520cf5
14993c5
 
 
 
21803e1
 
 
15738eb
21803e1
14993c5
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
language:
- en
- fr
- de
pipeline_tag: text-classification
---

# Multilingual classification model to detect texts from the political science domain

Accuracy: 0.978

Predicts 2 classes:
| class    | description              | precision | recall | f1-score | support |
|----------|--------------------------|-----------|--------|----------|---------|
| politics |political science         | 0.975     | 0.978  | 0.976    | 2143    |
| multi    |other scientific domains  | 0.981     | 0.979  | 0.980    | 2583    |

Evaluation by class and language:
| class    | description              | language | precision | recall | f1-score | support |
|----------|--------------------------|----------|-----------|--------|----------|---------|
| politics | political science        | English  | 0,989     | 0,993  | 0,991    | 1212    |
| multi    | other scientific domains | English  | 0,992     | 0,989  | 0,991    | 1164    |
| politics | political science        | German   | 0,952     | 0,958  | 0,955    | 783     |
| multi    | other scientific domains | German   | 0,957     | 0,951  | 0,954    | 776     |
| politics | political science        | French   | 0,979     | 0,959  | 0,969    | 148     |
| multi    | other scientific domains | French   | 0,991     | 0,995  | 0,993    | 643     |

Based on [BERT multilingual base model (uncased)](http://arxiv.org/abs/1810.04805)

This model is a multilingual version of our [SSciBERT_politics](https://huggingface.co/kalawinka/SSciBERT_politics). 
The model was fine-tuned using a dataset of 14,178 abstracts from scientific articles retrieved from the [BASE](https://www.base-search.net/) 
and [POLLUX](https://www.pollux-fid.de/) collections of scientific articles. 
Abstracts from scientific articles in 3 languages (English, German and French) were used for the training.
The BASE data were labelled as "politics" or "multi" according to the Dewey Decimal Classification (DDC). 
Data from several major political science journals in the POLLUX dataset were marked as "politics" class.

# Usage

Requires: [transformers](https://huggingface.co/docs/transformers/index) (pip install transformers)

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained('kalawinka/bert-base-ml-politics')
model = AutoModelForSequenceClassification.from_pretrained('kalawinka/bert-base-ml-politics')
pipe = pipeline("text-classification", model=model, tokenizer = tokenizer, max_length=512, truncation=True)

pipe("""Verschiedene Arten der Art und Weise: zu ihrer Positionierung im Deutschen und Englischen Ausgehend von der Annahme, daß die Stellung der Adverbiale ihre semantischen Relationen zum Rest des Satzes widerspiegelt, wird gezeigt, daß die traditionelle Klasse der Adverbiale der Art und Weise in verschiedene Klassen zerfällt; in die bei dem (finalen) Verb stehenden prozeßbezogenen Adverbiale, andererseits in die subjektbezogenen und ereignisbezogenen Adverbiale, die höher im Satz stehen. Adverbiale der "Art und Weise" dieser unterschiedlichen Gruppen zeigen nicht nur im Deutschen, sondern auch im Englischen und Französischen ein unterschiedliches Stellungsverhalten. Unterschiede, die sich zwischen diesen Sprachen hinsichtlich der Stellung dieser Adverbien beobachten lassen, sind auf Unterschiede in den Satzstrukturen zurückzuführen. Proceeding from the assumption that the positions of adverbials reflect their semantic relations to the rest of the sentence, it is shown that the traditional class of manner adverbs can be divided into several classes: on the one hand there are process-related adverbs which are closely related to (final) verbs, on the other subject-oriented and event-related adverbs occurring higher in the sentence. "Manner adverbs" of these different groups can occupy different positions in German and English as well as in French. It can be argued that differences in adverb positions between these languages are the result of different sentence structures.""")
```
This produces the following output:

```
[{'label': 'multi', 'score': 0.9998677968978882}]
```