Model Card for Ausklasser
Model Description
Ausklasser is a text classification model designed to identify apprenticeship job advertisements (AOJAs) from regular job advertisements (ROJAs) in the German language. The model is built on the distilBERT architecture, offering an efficient and compact solution for processing German Online Job Advertisements (OJAs).
Developer and Intended Use
Developed by Kai Krüger at the German Federal Institute for Vocational Education and Training, this model is intended for researchers and professionals involved in labor market analysis, specifically for distinguishing between apprenticeship and regular job listings in German.
Training & Development
Training, data and experiments are described in the corresponding publication
Evaluation Results
Ausklasser achieved high accuracy and generalization capabilities in both training and testing. Specifically, it demonstrated an accuracy of 0.98 on the test set and 0.9 in training evaluation.
Limitations
- The model's training involved complex decisions around hyperparameters, dataset size, and balancing strategies.
- The size of the datasets and the presence of boilerplate text in the data may impact the model's performance.
- The model's performance could be influenced by the specific characteristics of the datasets used.
Ethical Considerations
- The model should be used responsibly, considering the context of German labor market dynamics.
- Users should be aware of the limitations and potential biases inherent in the training data.
Usage
The model is available on Hugging Face and can be utilized for classifying German OJAs into four categories:
Label | Category |
---|---|
0 | Apprenticeships |
1 | Other Minor Positions |
2 | Leading Position |
3 | Regular Workers |
# Example Python code for using the Ausklasser model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("KKrueger/ausklasser")
model = AutoModelForSequenceClassification.from_pretrained("KKrueger/ausklasser")
# Example text
text = "Your German job advertisement text here"
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# Process outputs (for example, convert to labels)
#
Links
- Downloads last month
- 10