Sifter Redrob Reranker

This is the first trained reranker for Sifter, an AI hiring-ranking system built for the Redrob challenge.

The model reads a job description and one candidate profile together, then predicts a 0-1 fit score. In Sifter, it is used as a learned second opinion on the finalist pool after the full 100,000-candidate explainable ranker has already run.

Project repo: Sifter_Redrob_Hackathon
Live app: https://sifter1011.web.app

What This Model Does

Sifter already has a deterministic evidence ranker that can process the full Redrob candidate pool locally. This model adds a trainable layer on top:

Sifter ranks the full candidate pool using explainable evidence.
The backend sends only the finalist pool to this Hugging Face model.
The model returns a learned fit score.
Sifter blends the scores and keeps the explanation/bias guardrails visible.

Current blend in the Sifter backend:

70% explainable Sifter evidence score
30% learned reranker score

Default rerank scope:

top 25 finalist candidates

Training Data

This revised public model was trained on Redrob-derived Sifter preference data with human-reviewed recruiter-style labels, not on a generic public ranking benchmark.

Training run:

Item	Value
Source	Redrob candidate profiles + human-reviewed Sifter candidate review set
Total examples	180 job-candidate examples
Train split	166 examples
Validation split	14 examples
Job description	Redrob Senior AI Engineer style role brief
Label type	Continuous fit score from `0.0` to `1.0`
Label source	Human-reviewed labels from the 180-candidate review set
Human label mix	46 `strong_fit`, 58 `maybe`, 76 `not_fit`
Human independent holdout	Small reviewed validation split; no separate multi-recruiter panel yet

Each training example is shaped like this:

Job description + candidate profile -> fit score

The candidate profile text includes title, summary/headline, years of experience, location, career history, skills, certifications, assessments, and Redrob behavioral/logistics signals.

Label Scale

The revised run uses human-reviewed labels so the model learns from actual recruiter-style judgment instead of only bootstrapped scores.

Label area	Meaning
`0.90 - 1.00`	Strong shortlist / interview-style fit
`0.55 - 0.72`	Review or maybe-fit candidates
`0.08 - 0.15`	Weak fit, rejected, or unranked lower-priority candidates

Recruiter labels are supported by the training script and override weak labels when present:

Recruiter label	Score
`hire`	`1.00`
`strong_fit`	`0.95`
`interview`	`0.90`
`review`	`0.62`
`maybe`	`0.55`
`not_fit`	`0.08`
`reject`	`0.00`

Important: these labels are stronger than weak supervision, but they are still a compact review set. The next stronger version should add more reviewers and a separate held-out recruiter panel.

Metrics

Validation results from the human-reviewed revised run:

Metric	Value
Validation loss	`0.0443`
RMSE	`0.2104`
MAE	`0.1884`
Spearman rank correlation	`0.7526`

What Spearman means in plain language: when the human-reviewed labels say candidate A should usually rank above candidate B, the model's scores mostly move in the same direction. 0.7526 is a strong sign that the learned reranker is now aligned with the reviewed candidate judgments.

Training Procedure

Base model:

distilbert-base-uncased

Fine-tuning method:

Supervised reward-model regression fine-tuning

Training setup:

Hyperparameter	Value
Epochs	`3.0`
Training steps	Colab GPU run on 166 reviewed training rows
Batch size	`8`
Learning rate	`2e-5`
Max sequence length	`256`
Optimizer	AdamW
Precision	FP32

The model head is a single regression output (num_labels=1) trained with mean squared error loss.

Why This Is Still Human-In-The-Loop

This model is not treated as an automatic hiring decision system. The reviewed-label run improves the learned ranking signal, but Sifter still keeps human-facing checks:

every rank still shows evidence and concern text,
the bias guardrail stays visible,
reviewer-agent questions challenge the result,
recruiters can add more labels for future retraining.

How It Is Integrated Into Sifter

The model is wired into the Sifter backend:

Code path	Purpose
`apps/api/src/learned-rerank.ts`	Calls this Hugging Face model, parses the returned score, blends it into finalist ranking, and falls back safely
`apps/api/src/config.ts`	Reads `HF_TOKEN`, `SIFTER_RERANKER_MODEL`, rerank weight, and finalist limit
`apps/api/src/server.ts`	Exposes learned reranking through the Redrob API flow
`apps/web/src/App.tsx`	Shows learned-reranker status in the UI

The model is not allowed to become an unchecked black box. The deterministic Sifter reason, score breakdown, bias guardrail, and reviewer-agent questions remain visible after reranking.

Limitations

The model is trained for the Redrob/Sifter Senior AI Engineer ranking setup, not general hiring across every role.
The revised run uses 180 human-reviewed examples, so it is stronger than weak supervision but still small.
The validation metric is measured on a 14-row reviewed validation split, not a large independent recruiter panel.
The model can learn patterns present in the review labels, so Sifter keeps deterministic explanations and bias guardrails in the final product.
The Redrob dataset does not include protected demographic labels, so this model card does not claim protected-class fairness parity.

Responsible Use

Use this model as a recruiter-assist reranker, not as an automatic hiring decision system. It should support human review by providing an additional fit signal while Sifter continues to show evidence, concerns, and bias checks.

Recommended use:

rerank finalist pools,
compare candidate-job fit,
support interview shortlist review,
collect recruiter labels for a better second version.

Not recommended:

automatic rejection without human review,
ranking based on identity or protected traits,
claiming fairness parity without a protected-label audit,
using the score without reading the explanation and evidence.

Downloads last month: 190

Safetensors

Model size

67M params

Tensor type

F32

Model tree for shikharshahi/sifter-redrob-reranker

Base model

distilbert/distilbert-base-uncased

Finetuned

(11928)

this model

Space using shikharshahi/sifter-redrob-reranker 1

Evaluation results

Spearman rank correlation on Redrob Challenge human-reviewed validation split
validation set self-reported

0.753
RMSE on Redrob Challenge human-reviewed validation split
validation set self-reported

0.210
MAE on Redrob Challenge human-reviewed validation split
validation set self-reported

0.188