--- license: mit library_name: sklearn tags: - sklearn - skops - text-classification model_format: pickle model_file: legalis-scikit.pkl datasets: - LennardZuendorf/legalis language: - de metrics: - accuracy - f1 --- # Model description This is a tuned random forest classifiert, trained on a processed dataset of 2800 German court cases (see [legalis dataset](https://huggingface.co/datasets/LennardZuendorf/legalis)). It predicts the winner (defended/"Verklagt*r" or plaintiff/"Kläger*in") of a court case based on facts provided (in German). ## Intended uses & limitations - This model was created as part of a university project and should be considered highly experimental. ## get started with the model Try out the hosted Interference UI or the [Huggingface Space](https://huggingface.co/spaces/LennardZuendorf/legalis) ``` import pickle with open(dtc_pkl_filename, 'rb') as file: clf = pickle.load(file) ``` ### The modelHyperparameters - The Classifier was tuned with scikit's cv search method, the pipeline used a CountVectorizer with common German stopwords.
Click to expand | Hyperparameter | Value | |-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | memory | | | steps | [('count', CountVectorizer(ngram_range=(1, 3),
stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles',
'als', 'also', 'am', 'an', 'ander', 'andere',
'anderem', 'anderen', 'anderer', 'anderes',
'anderm', 'andern', 'anderr', 'anders', 'auch',
'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da',
'damit', 'dann', ...])), ('clf', RandomForestClassifier(min_samples_split=5, random_state=0))] | | verbose | False | | count | CountVectorizer(ngram_range=(1, 3),
stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles',
'als', 'also', 'am', 'an', 'ander', 'andere',
'anderem', 'anderen', 'anderer', 'anderes',
'anderm', 'andern', 'anderr', 'anders', 'auch',
'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da',
'damit', 'dann', ...]) | | clf | RandomForestClassifier(min_samples_split=5, random_state=0) | | count__analyzer | word | | count__binary | False | | count__decode_error | strict | | count__dtype | | | count__encoding | utf-8 | | count__input | content | | count__lowercase | True | | count__max_df | 1.0 | | count__max_features | | | count__min_df | 1 | | count__ngram_range | (1, 3) | | count__preprocessor | | | count__stop_words | ['aber', 'alle', 'allem', 'allen', 'aller', 'alles', 'als', 'also', 'am', 'an', 'ander', 'andere', 'anderem', 'anderen', 'anderer', 'anderes', 'anderm', 'andern', 'anderr', 'anders', 'auch', 'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da', 'damit', 'dann', 'der', 'den', 'des', 'dem', 'die', 'das', 'dass', 'daß', 'derselbe', 'derselben', 'denselben', 'desselben', 'demselben', 'dieselbe', 'dieselben', 'dasselbe', 'dazu', 'dein', 'deine', 'deinem', 'deinen', 'deiner', 'deines', 'denn', 'derer', 'dessen', 'dich', 'dir', 'du', 'dies', 'diese', 'diesem', 'diesen', 'dieser', 'dieses', 'doch', 'dort', 'durch', 'ein', 'eine', 'einem', 'einen', 'einer', 'eines', 'einig', 'einige', 'einigem', 'einigen', 'einiger', 'einiges', 'einmal', 'er', 'ihn', 'ihm', 'es', 'etwas', 'euer', 'eure', 'eurem', 'euren', 'eurer', 'eures', 'für', 'gegen', 'gewesen', 'hab', 'habe', 'haben', 'hat', 'hatte', 'hatten', 'hier', 'hin', 'hinter', 'ich', 'mich', 'mir', 'ihr', 'ihre', 'ihrem', 'ihren', 'ihrer', 'ihres', 'euch', 'im', 'in', 'indem', 'ins', 'ist', 'jede', 'jedem', 'jeden', 'jeder', 'jedes', 'jene', 'jenem', 'jenen', 'jener', 'jenes', 'jetzt', 'kann', 'kein', 'keine', 'keinem', 'keinen', 'keiner', 'keines', 'können', 'könnte', 'machen', 'man', 'manche', 'manchem', 'manchen', 'mancher', 'manches', 'mein', 'meine', 'meinem', 'meinen', 'meiner', 'meines', 'mit', 'muss', 'musste', 'nach', 'nicht', 'nichts', 'noch', 'nun', 'nur', 'ob', 'oder', 'ohne', 'sehr', 'sein', 'seine', 'seinem', 'seinen', 'seiner', 'seines', 'selbst', 'sich', 'sie', 'ihnen', 'sind', 'so', 'solche', 'solchem', 'solchen', 'solcher', 'solches', 'soll', 'sollte', 'sondern', 'sonst', 'über', 'um', 'und', 'uns', 'unsere', 'unserem', 'unseren', 'unser', 'unseres', 'unter', 'viel', 'vom', 'von', 'vor', 'während', 'war', 'waren', 'warst', 'was', 'weg', 'weil', 'weiter', 'welche', 'welchem', 'welchen', 'welcher', 'welches', 'wenn', 'werde', 'werden', 'wie', 'wieder', 'will', 'wir', 'wird', 'wirst', 'wo', 'wollen', 'wollte', 'würde', 'würden', 'zu', 'zum', 'zur', 'zwar', 'zwischen'] | | count__strip_accents | | | count__token_pattern | (?u)\b\w\w+\b | | count__tokenizer | | | count__vocabulary | | | clf__bootstrap | True | | clf__ccp_alpha | 0.0 | | clf__class_weight | | | clf__criterion | gini | | clf__max_depth | | | clf__max_features | sqrt | | clf__max_leaf_nodes | | | clf__max_samples | | | clf__min_impurity_decrease | 0.0 | | clf__min_samples_leaf | 1 | | clf__min_samples_split | 5 | | clf__min_weight_fraction_leaf | 0.0 | | clf__n_estimators | 100 | | clf__n_jobs | | | clf__oob_score | False | | clf__random_state | 0 | | clf__verbose | 0 | | clf__warm_start | False |
### Model Plot
Pipeline(steps=[('count',CountVectorizer(ngram_range=(1, 3),stop_words=['aber', 'alle', 'allem', 'allen','aller', 'alles', 'als', 'also','am', 'an', 'ander', 'andere','anderem', 'anderen', 'anderer','anderes', 'anderm', 'andern','anderr', 'anders', 'auch', 'auf','aus', 'bei', 'bin', 'bis', 'bist','da', 'damit', 'dann', ...])),('clf',RandomForestClassifier(min_samples_split=5, random_state=0))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
## Evaluation Results | Metric | Value | |----------|----------| | accuracy | 0.664286 | | f1 score | 0.664286 | # Model Card Authors This model card and the model itself are written by following authors: [@LennardZuendorf -HGF](https://huggingface.co/LennardZuendorf) [@LennardZuendorf - Github](https://github.com/LennardZuendorf) # Citation See Dataset for Sources and refer to [Github](https://github.com/LennardZuendorf/uniArchive-legalis) for collection of all files.