legalis-scikit / README.md
LennardZuendorf's picture
Update README.md
278b236
metadata
license: mit
library_name: sklearn
tags:
  - sklearn
  - skops
  - text-classification
model_format: pickle
model_file: legalis-scikit.pkl
datasets:
  - LennardZuendorf/legalis
language:
  - de
metrics:
  - accuracy
  - f1

Model description

This is a tuned random forest classifiert, trained on a processed dataset of 2800 German court cases (see legalis dataset). It predicts the winner (defended/"Verklagtr" or plaintiff/"Klägerin") of a court case based on facts provided (in German).

Intended uses & limitations

  • This model was created as part of a university project and should be considered highly experimental.

get started with the model

Try out the hosted Interference UI or the Huggingface Space

import pickle 
with open(dtc_pkl_filename, 'rb') as file: 
    clf = pickle.load(file)

The modelHyperparameters

  • The Classifier was tuned with scikit's cv search method, the pipeline used a CountVectorizer with common German stopwords.
Click to expand
Hyperparameter Value
memory
steps [('count', CountVectorizer(ngram_range=(1, 3),
stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles',
'als', 'also', 'am', 'an', 'ander', 'andere',
'anderem', 'anderen', 'anderer', 'anderes',
'anderm', 'andern', 'anderr', 'anders', 'auch',
'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da',
'damit', 'dann', ...])), ('clf', RandomForestClassifier(min_samples_split=5, random_state=0))]
verbose False
count CountVectorizer(ngram_range=(1, 3),
stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles',
'als', 'also', 'am', 'an', 'ander', 'andere',
'anderem', 'anderen', 'anderer', 'anderes',
'anderm', 'andern', 'anderr', 'anders', 'auch',
'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da',
'damit', 'dann', ...])
clf RandomForestClassifier(min_samples_split=5, random_state=0)
count__analyzer word
count__binary False
count__decode_error strict
count__dtype <class 'numpy.int64'>
count__encoding utf-8
count__input content
count__lowercase True
count__max_df 1.0
count__max_features
count__min_df 1
count__ngram_range (1, 3)
count__preprocessor
count__stop_words ['aber', 'alle', 'allem', 'allen', 'aller', 'alles', 'als', 'also', 'am', 'an', 'ander', 'andere', 'anderem', 'anderen', 'anderer', 'anderes', 'anderm', 'andern', 'anderr', 'anders', 'auch', 'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da', 'damit', 'dann', 'der', 'den', 'des', 'dem', 'die', 'das', 'dass', 'daß', 'derselbe', 'derselben', 'denselben', 'desselben', 'demselben', 'dieselbe', 'dieselben', 'dasselbe', 'dazu', 'dein', 'deine', 'deinem', 'deinen', 'deiner', 'deines', 'denn', 'derer', 'dessen', 'dich', 'dir', 'du', 'dies', 'diese', 'diesem', 'diesen', 'dieser', 'dieses', 'doch', 'dort', 'durch', 'ein', 'eine', 'einem', 'einen', 'einer', 'eines', 'einig', 'einige', 'einigem', 'einigen', 'einiger', 'einiges', 'einmal', 'er', 'ihn', 'ihm', 'es', 'etwas', 'euer', 'eure', 'eurem', 'euren', 'eurer', 'eures', 'für', 'gegen', 'gewesen', 'hab', 'habe', 'haben', 'hat', 'hatte', 'hatten', 'hier', 'hin', 'hinter', 'ich', 'mich', 'mir', 'ihr', 'ihre', 'ihrem', 'ihren', 'ihrer', 'ihres', 'euch', 'im', 'in', 'indem', 'ins', 'ist', 'jede', 'jedem', 'jeden', 'jeder', 'jedes', 'jene', 'jenem', 'jenen', 'jener', 'jenes', 'jetzt', 'kann', 'kein', 'keine', 'keinem', 'keinen', 'keiner', 'keines', 'können', 'könnte', 'machen', 'man', 'manche', 'manchem', 'manchen', 'mancher', 'manches', 'mein', 'meine', 'meinem', 'meinen', 'meiner', 'meines', 'mit', 'muss', 'musste', 'nach', 'nicht', 'nichts', 'noch', 'nun', 'nur', 'ob', 'oder', 'ohne', 'sehr', 'sein', 'seine', 'seinem', 'seinen', 'seiner', 'seines', 'selbst', 'sich', 'sie', 'ihnen', 'sind', 'so', 'solche', 'solchem', 'solchen', 'solcher', 'solches', 'soll', 'sollte', 'sondern', 'sonst', 'über', 'um', 'und', 'uns', 'unsere', 'unserem', 'unseren', 'unser', 'unseres', 'unter', 'viel', 'vom', 'von', 'vor', 'während', 'war', 'waren', 'warst', 'was', 'weg', 'weil', 'weiter', 'welche', 'welchem', 'welchen', 'welcher', 'welches', 'wenn', 'werde', 'werden', 'wie', 'wieder', 'will', 'wir', 'wird', 'wirst', 'wo', 'wollen', 'wollte', 'würde', 'würden', 'zu', 'zum', 'zur', 'zwar', 'zwischen']
count__strip_accents
count__token_pattern (?u)\b\w\w+\b
count__tokenizer
count__vocabulary
clf__bootstrap True
clf__ccp_alpha 0.0
clf__class_weight
clf__criterion gini
clf__max_depth
clf__max_features sqrt
clf__max_leaf_nodes
clf__max_samples
clf__min_impurity_decrease 0.0
clf__min_samples_leaf 1
clf__min_samples_split 5
clf__min_weight_fraction_leaf 0.0
clf__n_estimators 100
clf__n_jobs
clf__oob_score False
clf__random_state 0
clf__verbose 0
clf__warm_start False

Model Plot

Pipeline(steps=[('count',CountVectorizer(ngram_range=(1, 3),stop_words=['aber', 'alle', 'allem', 'allen','aller', 'alles', 'als', 'also','am', 'an', 'ander', 'andere','anderem', 'anderen', 'anderer','anderes', 'anderm', 'andern','anderr', 'anders', 'auch', 'auf','aus', 'bei', 'bin', 'bis', 'bist','da', 'damit', 'dann', ...])),('clf',RandomForestClassifier(min_samples_split=5, random_state=0))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

Metric Value
accuracy 0.664286
f1 score 0.664286

Model Card Authors

This model card and the model itself are written by following authors:

@LennardZuendorf -HGF @LennardZuendorf - Github

Citation

See Dataset for Sources and refer to Github for collection of all files.