CV Parser NER — roberta-base (v2)

Token-classification model that extracts Job Titles, Skills, and Education from resumes/CVs using a BIO tag scheme.

Provenance

  • Trained from scratch on dataset 4 (resume_bio_annotated_full.csv, 2,483 resumes — 1,739 train / 372 val / 372 test), the team's finalized AI-Studio/Vertex-relabelled dataset.
  • Reproduced end-to-end with the project notebooks/scripts (retokenize.py + train_bert_run.py).
  • Base model: roberta-base · epochs: 5 · learning rate: 3e-5 · max_length 512 · stride 128 · seed 42.

Resume-level performance (dataset-4 splits)

split precision recall F1
validation — — 0.6397
test — — 0.6563

Labels

O, B-JOB_TITLE, I-JOB_TITLE, B-SKILL, I-SKILL, B-EDUCATION, I-EDUCATION

Downloads last month
20
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Zeqhx/cv-parser-roberta-v2

Finetuned
(2317)
this model