Model Card for Ausklasser

Model Description

Ausklasser is a text classification model designed to identify apprenticeship job advertisements (AOJAs) from regular job advertisements (ROJAs) in the German language. The model is built on the distilBERT architecture, offering an efficient and compact solution for processing German Online Job Advertisements (OJAs).

Developer and Intended Use

Developed by Kai Krüger at the German Federal Institute for Vocational Education and Training, this model is intended for researchers and professionals involved in labor market analysis, specifically for distinguishing between apprenticeship and regular job listings in German.

Training & Development

Training, data and experiments are described in the corresponding publication

Evaluation Results

Ausklasser achieved high accuracy and generalization capabilities in both training and testing. Specifically, it demonstrated an accuracy of 0.98 on the test set and 0.9 in training evaluation.

Limitations

The model's training involved complex decisions around hyperparameters, dataset size, and balancing strategies.
The size of the datasets and the presence of boilerplate text in the data may impact the model's performance.
The model's performance could be influenced by the specific characteristics of the datasets used.

Ethical Considerations

The model should be used responsibly, considering the context of German labor market dynamics.
Users should be aware of the limitations and potential biases inherent in the training data.

Usage

The model is available on Hugging Face and can be utilized for classifying German OJAs into four categories:

Label	Category
0	Apprenticeships
1	Other Minor Positions
2	Leading Position
3	Regular Workers

# Example Python code for using the Ausklasser model
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("KKrueger/ausklasser")
model = AutoModelForSequenceClassification.from_pretrained("KKrueger/ausklasser")

# Example text
text = "Your German job advertisement text here"

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# Process outputs (for example, convert to labels)
#

Links

Paper
Code