kalawinka's picture
Update README.md
15738eb verified
|
raw
history blame
4.16 kB
metadata
language:
  - en
  - fr
  - de
pipeline_tag: text-classification

Multilingual classification model to detect texts from the political science domain

Accuracy: 0.978

Predicts 2 classes:

class description precision recall f1-score support
politics political science 0.975 0.978 0.976 2143
multi other scientific domains 0.981 0.979 0.980 2583

Evaluation by class and language:

class description language precision recall f1-score support
politics political science English 0,989 0,993 0,991 1212
multi other scientific domains English 0,992 0,989 0,991 1164
politics political science German 0,952 0,958 0,955 783
multi other scientific domains German 0,957 0,951 0,954 776
politics political science French 0,979 0,959 0,969 148
multi other scientific domains French 0,991 0,995 0,993 643

Based on BERT multilingual base model (uncased)

This model is a multilingual version of our SSciBERT_politics. The model was fine-tuned using a dataset of 14,178 abstracts from scientific articles retrieved from the BASE and POLLUX collections of scientific articles. Abstracts from scientific articles in 3 languages (English, German and French) were used for the training. The BASE data were labelled as "politics" or "multi" according to the Dewey Decimal Classification (DDC). Data from several major political science journals in the POLLUX dataset were marked as "politics" class.

Usage

Requires: transformers (pip install transformers)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained('kalawinka/bert-base-ml-politics')
model = AutoModelForSequenceClassification.from_pretrained('kalawinka/bert-base-ml-politics')
pipe = pipeline("text-classification", model=model, tokenizer = tokenizer, max_length=512, truncation=True)

pipe("""Verschiedene Arten der Art und Weise: zu ihrer Positionierung im Deutschen und Englischen Ausgehend von der Annahme, daß die Stellung der Adverbiale ihre semantischen Relationen zum Rest des Satzes widerspiegelt, wird gezeigt, daß die traditionelle Klasse der Adverbiale der Art und Weise in verschiedene Klassen zerfällt; in die bei dem (finalen) Verb stehenden prozeßbezogenen Adverbiale, andererseits in die subjektbezogenen und ereignisbezogenen Adverbiale, die höher im Satz stehen. Adverbiale der "Art und Weise" dieser unterschiedlichen Gruppen zeigen nicht nur im Deutschen, sondern auch im Englischen und Französischen ein unterschiedliches Stellungsverhalten. Unterschiede, die sich zwischen diesen Sprachen hinsichtlich der Stellung dieser Adverbien beobachten lassen, sind auf Unterschiede in den Satzstrukturen zurückzuführen. Proceeding from the assumption that the positions of adverbials reflect their semantic relations to the rest of the sentence, it is shown that the traditional class of manner adverbs can be divided into several classes: on the one hand there are process-related adverbs which are closely related to (final) verbs, on the other subject-oriented and event-related adverbs occurring higher in the sentence. "Manner adverbs" of these different groups can occupy different positions in German and English as well as in French. It can be argued that differences in adverb positions between these languages are the result of different sentence structures.""")

This produces the following output:

[{'label': 'multi', 'score': 0.9998677968978882}]