Spaces:
Running
A newer version of the Gradio SDK is available:
6.5.1
AI Recruiting Assistant β Guide Book (Updated)
0) Overview
What this tool does
This AI Recruiting Assistant is a decision-support system that helps recruiters and hiring managers:
- Extract job requirements from a job description (JD)
- Evaluate resumes against verified requirements using evidence-based matching
- Assess job-relevant culture/working-style signals using retrieved company documents
- Run factuality checks to detect ungrounded claims
- Run a bias & fairness audit across the JD, analyses, and the modelβs final recommendation
The problem it addresses
Recruiting teams often face three recurring issues when using AI:
- Hallucinated requirements: LLMs may βinventβ skills that are not explicitly required.
- Opaque scoring: Many tools produce fit scores without clearly showing evidence.
- Bias risks: Hiring language and reasoning can leak pedigree/class proxies or subjective criteria.
This tool addresses those issues by enforcing:
- Deterministic verification gates (requirements are verified before scoring)
- Evidence-backed scoring (only verified requirements are scored; each match includes a quote)
- Self-verification and self-correction (factuality checks can trigger automatic revision)
- Bias auditing (flags risky language and inconsistent standards)
How it differentiates from typical recruiting tools
Compared with βblack-boxβ resume screeners or generic LLM chatbots, this system emphasizes:
- Transparency: Outputs include what was required, what was verified, what was dropped, and why.
- Auditability: The scoring math is deterministic and traceable to inputs.
- Self-verifying behavior: Claims are checked against source text; unverified claims can be removed.
- Bias checks by design: Bias-sensitive content is audited explicitly instead of implicitly influencing scores.
- Culture check thatβs job-performance aligned: Culture attributes are framed as job-relevant behaviors, not background proxies.
1) Inputs and Document Handling
1.1 What the user uploads
The tool operates on three inputs:
- Company culture / values documents (PDF/DOCX)
- Resumes (PDF/DOCX)
- Job description (pasted text)
1.2 Resume anonymization
Before resumes are stored or analyzed, the tool applies heuristic redaction:
- Emails, phone numbers, URLs
- Addresses / location identifiers
- Explicit demographic fields
- Likely name header (first line)
This reduces exposure of personal identifiers and keeps analysis focused on job evidence.
1.3 Vector stores (retrieval)
The tool maintains two separate Chroma collections:
- Resumes (anonymized + chunked)
- Culture docs (chunked)
Chunking uses a recursive splitter with overlap to preserve context.
2) End-to-End Logic Flow (Step-by-Step)
Below is the stepwise flow executed when a recruiter clicks Analyze Candidates.
Step 0 β Prerequisite: Documents exist in storage
- Culture docs and resumes must be stored first.
- If not stored, retrieval will be empty or low-signal.
Step 1 β Extract required skills from the Job Description (LLM-driven)
Goal: Identify only skills that are explicitly required.
The tool prompts the LLM to return JSON only:
required_skills: [{skill, evidence_quote}]
The LLM is instructed to:
- include only MUST HAVE / explicitly required skills
- exclude βnice-to-havesβ and implied skills
- copy a short verbatim quote as evidence
LLM role: structured extraction.
Failure behavior: If JSON parsing fails, the tool stops and prints the raw output.
Step 2 β Verify extracted skills against the JD (deterministic, Python)
Goal: Block hallucinated requirements from entering scoring.
Each extracted item is classified:
- Quote-verified (strong): the evidence quote appears verbatim in the JD
- Name-only (weak): the skill name appears in the JD, but the quote doesnβt match
- Unverified (dropped): neither quote nor name appears
Deterministic gate:
- Only quote-verified skills are used as the final required list for scoring.
- Name-only and dropped skills are reported for transparency.
Output: βRequirements Verificationβ section shows:
- extracted count
- quote-verified vs name-only vs dropped
- list of skills used for scoring
- list of retracted/dropped items (with reason)
Step 3 β Retrieve the most relevant culture chunks (deterministic retrieval)
Goal: Ground culture evaluation in actual company documents.
- The tool runs similarity search over culture docs using the JD as query.
- It selects the top k chunks (e.g., k=3).
Deterministic component: vector retrieval parameters.
Output artifact: culture_context is the concatenated text of retrieved culture chunks.
Step 4 β Generate job-performance culture attributes (LLM-driven)
Goal: Create a small set of job-relevant behavioral attributes to evaluate consistently.
The tool prompts the LLM to return JSON:
cultural_attributes: ["...", "..."](4β6 items)
Attribute rules:
- Must be job-performance aligned behaviors (e.g., βevidence-based decision makingβ).
- Must avoid pedigree / class / prestige language.
- Must avoid non-performance preferences (e.g., remote-first, time zone).
LLM role: label generation from retrieved culture context.
Step 5 β Retrieve top resume chunks for the JD (deterministic retrieval)
Goal: Identify the most relevant candidates and their relevant resume text.
- The tool runs similarity search over resumes using the JD.
- It retrieves top k chunks (e.g., k=10) and groups them by
resume_id.
Note: Only retrieved chunks are analyzed. If relevant evidence isnβt retrieved, it may be missed.
Step 6 β Culture evidence matching per candidate (LLM + deterministic cleanup + deterministic scoring)
Goal: Determine which culture attributes are supported by resume evidence.
LLM-driven matching:
For each attribute, the LLM may return a match with:
evidence_type:directorinferredevidence_quotes: 1β2 verbatim resume quotesinference: required for inferredconfidence: 1β5
Deterministic cleanup rules (Python): A match is kept only if:
- attribute is present
- evidence_type is
directorinferred - at least one non-trivial quote exists
- confidence is an integer 1β5
- inferred matches include an inference sentence
- inferred matches can be required to meet a minimum confidence
Deterministic culture scoring (Python):
- Direct evidence weight: 1.0
- Inferred evidence weight: 0.5
Culture score is computed as:
(sum(weights for matched attributes) / number_of_required_attributes) * 100
Step 7 β Skills matching per candidate (LLM + deterministic scoring)
Goal: Match only the verified required skills to resume evidence.
Inputs:
- Candidate resume text (retrieved chunks)
- Verified required skills list (quote-only)
LLM output (JSON):
matched: [{skill, evidence_snippet}]missing: [skill](treated as advisory; missing is recomputed deterministically)
Deterministic missing calculation (Python):
- Missing = required_set β matched_set
Deterministic skills scoring (Python):
(number_of_matched_required_skills / number_of_required_skills) * 100
Step 8 β Implied competencies (NOT SCORED) for phone-screen guidance (LLM-driven, advisory)
Goal: When a required skill is missing explicitly, suggest whether it may be implied by adjacent evidence.
This step is not scored and does not affect proceed/do-not-proceed.
The LLM may suggest implied competencies only if it:
- uses conservative language (βmay be impliedβ)
- includes verbatim resume quotes
- provides a phone-screen validation question
Hard guardrail: Tool-specific skills (e.g., R/SAS/MATLAB) must be explicitly present in the resume to be suggested.
Step 9 β Factuality verification (LLM-driven verifier)
Goal: Detect ungrounded evidence claims.
The verifier checks evidence-backed match lines (e.g.,
- Skill: snippet).It ignores:
- numeric score lines
- missing lists
- policy text
Outputs:
- verified claims (β)
- unverified claims (β)
- factuality score
Step 10 β Final recommendation (LLM, policy-constrained)
Goal: Produce a structured recommendation without changing scores.
The model is given:
- skills analysis
- culture analysis
- fixed computed scores
- deterministic decision policy
Decision policy:
- If skills_score β₯ 70 β PROCEED
- If skills_score < 60 β DO NOT PROCEED
- If 60 β€ skills_score < 70 β PROCEED only if culture_score β₯ 70 else DO NOT PROCEED
Non-negotiables:
- LLM must not re-score.
- LLM must not introduce new claims.
Step 11 β Self-correction (triggered by verification issues)
Goal: Remove/correct any unverified claims while preserving scores/policy.
If any unverified claims exist:
- The tool asks the LLM to revise the recommendation
- Only the flagged claims may be removed/corrected
- Scores and policy must remain unchanged
Step 12 β Bias audit (LLM-driven audit across docs + reasoning)
Goal: Flag biased reasoning, biased JD language, or inconsistent standards.
Audit scope includes:
- Job description
- Skills analysis
- Culture analysis
- Final recommendation text
- Culture context
What it flags (examples):
- Prestige/pedigree signals (elite employers/education as proxy)
- Vague βpolish/executive presenceβ language not tied to job requirements
- Non-job-related culture screening
- Inconsistent standards (penalizing requirements not in JD)
- Overclaiming certainty
Outputs:
- structured list of bias indicators (category, severity, trigger text, why it matters, recommended fix)
- recruiter guidance
3) Scoring and Decision Rules (Deterministic)
3.1 Skills score
- Only quote-verified required skills count.
- Score = matches / required.
3.2 Culture score
- Score = weighted matches / attributes.
- Direct = 1.0; inferred = 0.5.
3.3 Labels
- β₯70: Strong fit
- 50β69: Moderate fit
- <50: Not a fit
3.4 Recommendation
Recommendation follows the fixed policy described in Step 10.
4) System Flow Diagram (Textual)
Below is a simplified, end-to-end flow of how data moves through the system.
[User Uploads]
|
v
+-------------------+
| Culture Documents |
+-------------------+ +-----------+
| | Job Desc |
v +-----------+
+-------------------+ |
| Culture Vector DB |<--------------+
+-------------------+ |
| v
| +---------------------+
| | Skill Extraction |
| | (LLM, JSON Output) |
| +---------------------+
| |
| v
| +---------------------+
| | Requirement |
| | Verification |
| | (Deterministic) |
| +---------------------+
| |
| v
| Verified Required Skills
| |
| v
+-------------------+ +---------------------+
| Resume Documents |------->| Resume Vector DB |
+-------------------+ +---------------------+
|
v
Similarity Search (k=10)
|
v
Resume Chunks (Grouped)
|
v
+-----------------------------+
| Culture Attribute Generator |
| (LLM, JSON Output) |
+-----------------------------+
|
v
+-----------------------------+
| Culture Evidence Matching |
| (LLM + Rules + Weights) |
+-----------------------------+
|
v
Culture Score (Deterministic)
|
v
+-----------------------------+
| Technical Skill Matching |
| (LLM + Deterministic Scoring)|
+-----------------------------+
|
v
Skills Score (Deterministic)
|
v
+-----------------------------+
| Implied Competencies (LLM) |
| (Not Scored, Advisory) |
+-----------------------------+
|
v
+-----------------------------+
| Factuality Verification |
| (LLM Verifier) |
+-----------------------------+
|
v
+-----------------------------+
| Recommendation Generator |
| (Policy-Constrained LLM) |
+-----------------------------+
|
v
+-----------------------------+
| Bias & Fairness Audit |
| (LLM Audit) |
+-----------------------------+
|
v
Final Recruiter Report
5) Audit Artifacts and Traceability
For every analysis run, the system produces and retains multiple audit artifacts that enable post-hoc review, regulatory defensibility, and debugging.
5.1 Input Artifacts
Original Job Description
- Full pasted JD text
Sanitized Resume Text
- Redacted resume content
- Redaction summary (internal)
Retrieved Culture Chunks
- Top-k (default: 3) culture document segments
- Vector similarity scores (internal)
Retrieved Resume Chunks
- Top-k (default: 10) resume segments
- Resume ID metadata
5.2 Requirement Verification Artifacts
Raw LLM Skill Extraction Output
Parsed Required Skills JSON
Verification Classification Table
- Quote-verified
- Name-only
- Dropped
Dropped-Skill Justifications
5.3 Culture Analysis Artifacts
Generated Culture Attribute List
LLM Raw Matching Output
Cleaned Match Records
- Evidence type
- Quotes
- Inference
- Confidence
Weighted Match Table
Computed Culture Score
5.4 Skills Analysis Artifacts
- Verified Required Skill List
- LLM Raw Matching Output
- Accepted Matched Skills
- Deterministic Missing-Skill Set
- Computed Skills Score
5.5 Implied Competency Artifacts (Advisory)
Missing Skill List
LLM Implied Output (JSON)
Accepted Implied Records
- Resume quotes
- Explanation
- Phone-screen questions
Rejected Inferences (internal)
5.6 Verification and Correction Artifacts
- Verifier Prompt and Output
- Verified / Unverified Claim Lists
- Factuality Scores
- Self-Correction Prompts and Revisions (if triggered)
5.7 Recommendation and Policy Artifacts
- Final Recommendation Prompt
- Policy Threshold Snapshot
- Immutable Score Values
- Generated Recommendation Text
5.8 Bias Audit Artifacts
- Bias Audit Prompt
- Audit Input Bundle (JD + Analyses + Recommendation)
- Structured Bias Indicator List
- Severity and Mitigation Suggestions
- Recruiter Guidance Text
5.9 System Metadata
- Timestamp of run
- Model version
- Prompt versions
- Chunking parameters
- Retrieval k-values
- Scoring parameters
6) Known Limitations
- Retrieval scope: evaluation depends on retrieved chunks; some evidence may be missed.
- Attribute generation variance: culture attributes can vary per run unless cached or cataloged.
- LLM evidence overreach: mitigated by verification and cleanup, but not eliminated.
- Bias audit is advisory: it flags issues; it does not enforce policy changes unless you add an auto-rewrite step.
6) Governance and Change Control
- Prompt changes must preserve JSON contracts.
- Any change that affects scoring or policy should be versioned.
- Audit outputs should be retained for traceability.
7) Intended Use
This tool is built for:
- faster, evidence-based screening
- transparent reasoning
- safer use of LLMs via verification and audits
It is not a substitute for:
- human judgment
- legal review
- formal HR policy compliance
High-level pipeline (inputs β outputs)
Inputs uploaded by recruiter
- Company culture/values docs (PDF/DOCX)
- Resumes (PDF/DOCX)
- Job description (text)
β¬οΈ
Indexing (deterministic, Python)
- Culture docs β chunk + embed β
culture_store - Resumes β anonymize β chunk + embed β
resume_store
β¬οΈ
Candidate assessment (per JD run)
Extract required skills (LLM) β JSON
required_skills[{skill,evidence_quote}]Verify extracted skills (Python) β quote-verified / name-only / dropped β quote-only list used for scoring
Retrieve relevant culture context (deterministic retrieval)
- Query: JD
- Retrieve: top-k culture chunks (current: k=3)
- Output:
culture_context
Generate job-relevant culture attributes (LLM) β JSON
cultural_attributes[4β6]Retrieve relevant resume chunks (deterministic retrieval)
- Query: JD
- Retrieve: top-k resume chunks (current: k=10)
- Group by
resume_id
- Per candidate: culture matching (LLM β cleanup β deterministic score)
- LLM proposes matches (direct/inferred) + quotes
- Python enforces validity gates
- Deterministic weighted culture score (direct=1.0, inferred=0.5)
- Per candidate: skills matching (LLM β deterministic score)
- LLM proposes matched skills + evidence snippets
- Python recomputes missing list deterministically
- Deterministic skills score using quote-verified requirements only
- Per candidate: implied competencies (LLM, NOT SCORED)
- Inputs: missing skills + matched skills + resume + JD
- Output: implied items with quotes + phone-screen questions
- Guardrail: tool-like skills (R/SAS/MATLAB) require explicit mention
Factuality verification (LLM verifier) β β/β for evidence-backed match lines + factuality score
Recommendation (LLM, policy constrained) β uses fixed scores + fixed decision policy
Self-correction (conditional) β triggered if any unverified claims exist
Bias audit (LLM) β audits JD + analyses + recommendation β structured bias indicators + guidance
β¬οΈ
Outputs per candidate
- Requirements verification summary (global)
- Culture analysis + score
- Skills analysis + score
- Implied (not scored) follow-ups
- Fact-check results
- Final recommendation (+ revision note if corrected)
- Bias audit
Component map (LLM vs deterministic)
LLM-driven components
- Required skill extraction (JSON)
- Culture attribute generation (JSON)
- Culture match proposals (JSON)
- Skills match proposals (JSON)
- Implied (not scored) follow-ups (JSON)
- Factuality verification (β/β)
- Final recommendation (policy constrained)
- Bias audit (structured)
Deterministic / Python-enforced components
- Resume anonymization
- Chunking + embedding + storage
- Retrieval parameters (top-k)
- Required-skill verification (quote/name-only/dropped)
- Deduplication of requirements
- Culture match cleanup rules (validity gates)
- Skills missing list recomputation
- Skills score computation
- Culture score computation with weights
- Decision thresholds (proceed / do not proceed)
- Self-correction trigger (presence of unverified claims)
Audit Artifacts
This section lists the primary artifacts produced (or recommended to persist) to make runs reviewable and defensible.
Inputs (source-of-truth)
- Job description text (as provided)
- Culture documents (original files)
- Resumes (original files)
Pre-processing
- Sanitized resume text (post-anonymization)
- Redaction notes (what was removed/masked)
- Chunking configuration (chunk_size, chunk_overlap)
- Embedding configuration (embedding model + settings)
Retrieval
- Culture retrieval query: JD text
- Culture retrieved chunks: top-k (current: k=3)
- Resume retrieval query: JD text
- Resume retrieved chunks: top-k (current: k=10)
- Candidate grouping: chunks grouped by
resume_id
Requirements verification
LLM
required_skillsJSON (raw)Normalized required skill list (deduped)
Verification output:
- quote-verified list
- name-only list
- dropped/unverified list
- counts and factuality score
Final scoring-required list: quote-verified only
Per-candidate analyses
Culture analysis
- Raw LLM culture-match JSON
- Post-cleanup matched culture list
- Missing culture attributes list
- Culture score + label
- Culture evidence lines shown to recruiters
Skills analysis
- Raw LLM skills-match JSON
- Matched skills list (with evidence snippets)
- Deterministically computed missing skills list
- Skills score + label
Implied (NOT SCORED)
- Raw LLM implied JSON
- Filtered implied list (must include resume quotes + phone-screen questions)
Verification & correction
- Verifier raw output (β/β lines)
- Verified claims list
- Unverified claims list
- Factuality score
- Self-correction trigger status (yes/no)
- Corrected recommendation (if triggered) + revision note
Bias audit
- Bias audit raw output (structured)
- Bias indicators list (category, severity, trigger_text, why_it_matters, recommended_fix)
- Overall assessment
- Recruiter guidance
Run-level trace (recommended)
For reproducibility/governance, also persist:
- Timestamp, model name, temperature, seed
- Prompt versions (hash or version ID)
- Retrieval parameters (k values)
- Score thresholds and policy version
- Any configuration overrides used during the run
End-to-End Pipeline (Swim-Lane View)
| Step | Recruiter / Input | Python / Deterministic Logic | LLM (Groq) | Storage / Output |
|---|---|---|---|---|
| 1 | Upload culture documents | Chunk + embed | β | culture_store (indexed) |
| 2 | Upload resumes | Anonymize β chunk β embed | β | resume_store (indexed) |
| 3 | Paste JD + Run | Send JD to LLM | Extract required skills + evidence quotes | required_skills JSON |
| 4 | β | Verify requirements (quote / name-only / dropped) | β | Verified list + debug report |
| 5 | β | Retrieve culture context (k=3) | β | culture_context |
| 6 | β | β | Generate culture attributes (job-performance aligned) | cultural_attributes JSON |
| 7 | β | Retrieve resume chunks (k=10), group by resume_id |
β | Candidate chunks |
| 8 | β | β | Propose culture matches (direct/inferred + quotes) | Raw culture-match JSON |
| 9 | β | Cleanup + weighted scoring (direct=1.0, inferred=0.5) | β | Culture score + evidence |
| 10 | β | β | Propose skill matches + evidence snippets | Raw skills-match JSON |
| 11 | β | Compute missing list + skills score (verified reqs only) | β | Skills score + missing list |
| 12 | β | β | Infer implied skills (NOT SCORED) + phone questions | Implied follow-ups |
| 13 | β | β | Verify evidence (β/β) | Factuality report |
| 14 | β | β | Generate recommendation (policy constrained) | Final recommendation |
| 15 | β | Trigger self-correction (if needed) | Revise flagged claims only | Corrected recommendation |
| 16 | β | β | Run bias audit (JD + analyses + decision) | Bias indicators + guidance |
| 17 | Review output | Assemble final report | β | Full candidate report |
Current Retrieval Parameters
- Culture store:
k = 3chunks (JD query) - Resume store:
k = 10chunks (JD query)