---
license: mit
library_name: sklearn
tags:
- sklearn
- skops
- text-classification
model_format: pickle
model_file: legalis-scikit.pkl
datasets:
- LennardZuendorf/legalis
language:
- de
metrics:
- accuracy
- f1
---
# Model description
This is a tuned random forest classifiert, trained on a processed dataset of 2800 German court cases (see [legalis dataset](https://huggingface.co/datasets/LennardZuendorf/legalis)). It predicts the winner (defended/"Verklagt*r" or plaintiff/"Kläger*in") of a court case based on facts provided (in German).
## Intended uses & limitations
- This model was created as part of a university project and should be considered highly experimental.
## get started with the model
Try out the hosted Interference UI or the [Huggingface Space](https://huggingface.co/spaces/LennardZuendorf/legalis)
```
import pickle
with open(dtc_pkl_filename, 'rb') as file:
clf = pickle.load(file)
```
### The modelHyperparameters
- The Classifier was tuned with scikit's cv search method, the pipeline used a CountVectorizer with common German stopwords.
Click to expand
| Hyperparameter | Value |
|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| memory | |
| steps | [('count', CountVectorizer(ngram_range=(1, 3),
stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles',
'als', 'also', 'am', 'an', 'ander', 'andere',
'anderem', 'anderen', 'anderer', 'anderes',
'anderm', 'andern', 'anderr', 'anders', 'auch',
'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da',
'damit', 'dann', ...])), ('clf', RandomForestClassifier(min_samples_split=5, random_state=0))] |
| verbose | False |
| count | CountVectorizer(ngram_range=(1, 3),
stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles',
'als', 'also', 'am', 'an', 'ander', 'andere',
'anderem', 'anderen', 'anderer', 'anderes',
'anderm', 'andern', 'anderr', 'anders', 'auch',
'auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da',
'damit', 'dann', ...]) |
| clf | RandomForestClassifier(min_samples_split=5, random_state=0) |
| count__analyzer | word |
| count__binary | False |
| count__decode_error | strict |
| count__dtype |
Pipeline(steps=[('count',CountVectorizer(ngram_range=(1, 3),stop_words=['aber', 'alle', 'allem', 'allen','aller', 'alles', 'als', 'also','am', 'an', 'ander', 'andere','anderem', 'anderen', 'anderer','anderes', 'anderm', 'andern','anderr', 'anders', 'auch', 'auf','aus', 'bei', 'bin', 'bis', 'bist','da', 'damit', 'dann', ...])),('clf',RandomForestClassifier(min_samples_split=5, random_state=0))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('count',CountVectorizer(ngram_range=(1, 3),stop_words=['aber', 'alle', 'allem', 'allen','aller', 'alles', 'als', 'also','am', 'an', 'ander', 'andere','anderem', 'anderen', 'anderer','anderes', 'anderm', 'andern','anderr', 'anders', 'auch', 'auf','aus', 'bei', 'bin', 'bis', 'bist','da', 'damit', 'dann', ...])),('clf',RandomForestClassifier(min_samples_split=5, random_state=0))])
CountVectorizer(ngram_range=(1, 3),stop_words=['aber', 'alle', 'allem', 'allen', 'aller', 'alles','als', 'also', 'am', 'an', 'ander', 'andere','anderem', 'anderen', 'anderer', 'anderes','anderm', 'andern', 'anderr', 'anders', 'auch','auf', 'aus', 'bei', 'bin', 'bis', 'bist', 'da','damit', 'dann', ...])
RandomForestClassifier(min_samples_split=5, random_state=0)