metadata
language:
- en
- fr
- de
pipeline_tag: text-classification
Multilingual classification model to detect texts from the political science domain
Accuracy: 0.978
Predicts 2 classes:
class | description | precision | recall | f1-score | support |
---|---|---|---|---|---|
politics | political science | 0.975 | 0.978 | 0.976 | 2143 |
multi | other scientific domains | 0.981 | 0.979 | 0.980 | 2583 |
Evaluation by class and language:
class | description | language | precision | recall | f1-score | support |
---|---|---|---|---|---|---|
politics | political science | English | 0,989 | 0,993 | 0,991 | 1212 |
multi | other scientific domains | English | 0,992 | 0,989 | 0,991 | 1164 |
politics | political science | German | 0,952 | 0,958 | 0,955 | 783 |
multi | other scientific domains | German | 0,957 | 0,951 | 0,954 | 776 |
politics | political science | French | 0,979 | 0,959 | 0,969 | 148 |
multi | other scientific domains | French | 0,991 | 0,995 | 0,993 | 643 |
Based on BERT multilingual base model (uncased)
This model is a multilingual version of our SSciBERT_politics. The model was fine-tuned using a dataset of 14,178 abstracts from scientific articles retrieved from the BASE and POLLUX collections of scientific articles. Abstracts from scientific articles in 3 languages (English, German and French) were used for the training. The BASE data were labelled as "politics" or "multi" according to the Dewey Decimal Classification (DDC). Data from several major political science journals in the POLLUX dataset were marked as "politics" class.
Usage
Requires: transformers (pip install transformers)
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained('kalawinka/bert-base-ml-politics')
model = AutoModelForSequenceClassification.from_pretrained('kalawinka/bert-base-ml-politics')
pipe = pipeline("text-classification", model=model, tokenizer = tokenizer, max_length=512, truncation=True)
pipe("""Verschiedene Arten der Art und Weise: zu ihrer Positionierung im Deutschen und Englischen Ausgehend von der Annahme, daß die Stellung der Adverbiale ihre semantischen Relationen zum Rest des Satzes widerspiegelt, wird gezeigt, daß die traditionelle Klasse der Adverbiale der Art und Weise in verschiedene Klassen zerfällt; in die bei dem (finalen) Verb stehenden prozeßbezogenen Adverbiale, andererseits in die subjektbezogenen und ereignisbezogenen Adverbiale, die höher im Satz stehen. Adverbiale der "Art und Weise" dieser unterschiedlichen Gruppen zeigen nicht nur im Deutschen, sondern auch im Englischen und Französischen ein unterschiedliches Stellungsverhalten. Unterschiede, die sich zwischen diesen Sprachen hinsichtlich der Stellung dieser Adverbien beobachten lassen, sind auf Unterschiede in den Satzstrukturen zurückzuführen. Proceeding from the assumption that the positions of adverbials reflect their semantic relations to the rest of the sentence, it is shown that the traditional class of manner adverbs can be divided into several classes: on the one hand there are process-related adverbs which are closely related to (final) verbs, on the other subject-oriented and event-related adverbs occurring higher in the sentence. "Manner adverbs" of these different groups can occupy different positions in German and English as well as in French. It can be argued that differences in adverb positions between these languages are the result of different sentence structures.""")
This produces the following output:
[{'label': 'multi', 'score': 0.9998677968978882}]