ALBERT-base fine-tuned on JABD for Bias Detection in Job Advertisements

This model is a fine-tuned version of albert-base-v2 on the Job Ads Bias Dataset (JABD) for token-level bias detection in job advertisements, framed as a Named Entity Recognition (NER) task using the BIO tagging scheme.

It is the best-performing model reported in the accompanying paper, "Bias Detection in Job Advertisement using Natural Language Processing".

Model Description

The model identifies and classifies 12 types of linguistic bias at the token/span level in job advertisements, covering both explicit and implicit biases across six sociodemographic groups: gender, religion, disability, ethnicity, age, and nationality.

Base model: albert-base-v2
Task: Token classification (NER) with BIO tagging
Language: English
Training dataset: Job Ads Bias Dataset (JABD) — 14,960 sentences with token-level annotations
Number of labels: 25 (12 bias categories × 2 BIO tags + O)

Bias Categories

Category	Type	Group
Generic She	Explicit	Gender
Generic He	Explicit	Gender
Explicit Marking of Sex	Explicit	Gender
Masculine Coded	Implicit	Gender
Feminine Coded	Implicit	Gender
Religion Related	Explicit	Religion
Disability Related	Explicit	Disability
Nationality Related	Explicit	Ethnicity
Ethnic Related	Explicit	Ethnicity
Age Related	Explicit	Age
Old Coded	Implicit	Age
Young Coded	Implicit	Age

Intended Use

Primary Use Cases

Flagging potentially biased language in job advertisements for human review.
Research on fairness, bias, and inclusion in recruitment-related text.
Building tools to assist recruiters and HR professionals in writing more inclusive job postings.

Out-of-Scope Uses

Legal determinations or hiring decisions. This model is not designed and must not be used as an automated decision-maker in any recruitment process.
Automated content moderation without human oversight.
Languages other than English. The model was trained exclusively on English-language job ads.
Domains other than job advertisements. Performance on other text domains has not been evaluated.

Performance

Micro-averaged token-level metrics on the JABD test split (averaged across three random seeds):

Metric	Score
F1	59.27 ± 0.86
Precision	65.29 ± 1.79
Recall	54.27 ± 0.33

Per-Label Performance (F1)

Performance varies substantially by category. Explicit biases are detected reliably; implicit (coded) biases remain challenging.

Label	Type	F1	Precision	Recall
Generic She	Explicit	88.57	85.57	91.92
Explicit Marking of Sex	Explicit	81.40	78.79	84.44
Disability	Explicit	77.70	71.78	85.13
Religion	Explicit	77.49	87.53	69.74
Generic He	Explicit	77.28	69.54	87.56
Nationality Related	Explicit	72.65	74.20	71.26
Ethnic Related	Explicit	72.10	64.61	82.39
Feminine Coded	Implicit	55.68	47.62	67.34
Masculine Coded	Implicit	51.69	41.34	68.96
Age Related	Explicit	41.01	29.06	72.08
Old Coded	Implicit	10.15	7.15	27.89
Young Coded	Implicit	0.53	0.28	4.94

Note: This checkpoint corresponds to a single random seed from the experiments reported in the paper.

How to Use

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

model_name = "your-username/albert-base-v2-jabd-bias-ner"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

text = "We are looking for a young and energetic salesman to join our team."
predictions = ner(text)

for pred in predictions:
    print(f"{pred['word']:<20} {pred['entity_group']:<25} {pred['score']:.3f}")

Training Details

Training Data

The model was trained on the Job Ads Bias Dataset (JABD), which contains 14,960 sentences with token-level BIO annotations across 12 bias categories. JABD was built on top of the EMSCAD corpus (Vidros et al., 2017) and annotated by 192 trained annotators recruited via Prolific, following a custom taxonomy and a rigorous quality-assurance process.

Data Splits

Split	Job IDs	Phrases
Train	7,195 (80%)	11,991 (80.15%)
Validation	899 (10%)	1,517 (10.14%)
Test	899 (10%)	1,452 (9.71%)

Splits were stratified by job_id to prevent cross-advertisement leakage.

Training Procedure

Architecture: ALBERT-base-v2 with a token classification head
Tagging scheme: BIO
Checkpoint selection: Best validation macro-F1
Regularization: Higher dropout and longer training (more epochs) to improve generalization

Limitations and Ethical Considerations

Limitations

Implicit bias remains hard to detect. Categories like Young Coded and Old Coded show very low F1 scores. Predictions in these categories should be interpreted with caution.
Taxonomy scope. The taxonomy does not cover all possible forms of bias (e.g., criminal record references, socioeconomic status, intersectional biases are underrepresented).
Cultural and temporal contingency. Bias is context- and group-dependent. The taxonomy reflects the cultural norms present in the EMSCAD corpus (English-language job ads, 2012–2014) and may not transfer cleanly to other contexts.
Subjectivity. Inter-annotator agreement (Krippendorff's α = 0.51) reflects the inherent subjectivity of the task. Model errors partly inherit this variability.
Single seed. This checkpoint corresponds to one random seed; results may vary slightly across seeds.

Ethical Considerations

This model addresses a sensitive topic with potential for misuse. We highlight the following:

Dual use. The model could in principle be inverted to craft covert discriminatory language or to identify thresholds for evading bias detection. Users must commit to non-discriminatory and assistive applications only.
Human-in-the-loop. Outputs are intended to flag language for human review, not to make automatic determinations about candidates, employers, or job postings.
Over-penalization. Excessive flagging of subtle or ambiguous wording can produce compliance theater or suppress inclusive language. Calibrate thresholds appropriately for your context.
Bias amplification. Unequal error rates across categories may amplify existing disparities. Per-class metrics should be monitored in deployment.

For a complete discussion, see Section 5.4 of the accompanying paper.

Citation

If you use this model, please cite:

@article{citation_2025,
  title={Bias Detection in Job Advertisement using Natural Language Processing},
  author={Private for now},
  journal={Journal name},
  year={2025}
}

Acknowledgments

The model is built on ALBERT (Lan et al., 2020) and uses the EMSCAD dataset (Vidros et al., 2017) as the source of job advertisements.

Contact

For questions about the model or dataset, please contact the authors via the paper or open an issue on the model repository.

Downloads last month: 18

Safetensors

Model size

11.1M params

Tensor type

F32

Model tree for mborquez/albert-base-v2-jabd-bias-ner

Base model

albert/albert-base-v2

Finetuned

(265)

this model