CV-Extractor / README.md
Sher1988's picture
update sdk_version: 1.37.1
e811837

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade
metadata
title: CV-Extractor
emoji: 📸
sdk: streamlit
sdk_version: 1.37.1
app_file: app.py

CV Analyzer (AI-Powered Resume Parser)

A Streamlit-based app that extracts structured data from CVs (PDF) using Docling + Agentic AI + Pydantic schema, and converts it into a clean, downloadable CSV.


Features

  • Upload CV (PDF)
  • Parse document using Docling
  • Extract structured data using LLM agent
  • Validate with Pydantic schema
  • Convert to Pandas DataFrame
  • View extracted data in UI
  • Download as CSV

Tech Stack

  • Streamlit – UI
  • Docling – PDF parsing
  • Pydantic / pydantic-ai – structured extraction
  • Hugging Face / LLM – inference
  • Pandas – data processing

Setup

1. Clone repo

git clone https://github.com/your-username/cv-analyzer.git
cd cv-analyzer

2. Create virtual environment

python -m venv .venv
source .venv/bin/activate   # Linux/macOS
.venv\Scripts\activate      # Windows

3. Install dependencies

pip install -r requirements.txt

4. Environment variables

Create a .env file:

HF_TOKEN=your_huggingface_token

.env is ignored via .gitignore


Run App

streamlit run app.py

How it works

  1. User uploads CV (PDF)
  2. Docling converts PDF → structured text/markdown
  3. LLM agent extracts data using predefined schema
  4. Output is validated via Pydantic
  5. Data is converted into a DataFrame
  6. User can view and download CSV

Notes

  • Schema is designed for AI/ML-focused resumes
  • Missing fields are returned as null (no hallucination policy)
  • Dates are stored as strings to avoid parsing errors
  • Validation is relaxed to improve LLM compatibility

Limitations

  • LLM may still produce inconsistent outputs for poorly formatted CVs
  • Complex layouts (tables, multi-column PDFs) may affect parsing quality
  • Requires internet access for model inference

Future Improvements

  • Multi-CV batch processing
  • Candidate scoring & ranking
  • Semantic search over resumes (FAISS)
  • UI improvements (filters, charts)
  • Export to JSON / Excel

License

MIT License