CV Parser NER — roberta-base (v2)

Token-classification model that extracts Job Titles, Skills, and Education from resumes/CVs using a BIO tag scheme.

Provenance

Trained from scratch on dataset 4 (resume_bio_annotated_full.csv, 2,483 resumes — 1,739 train / 372 val / 372 test), the team's finalized AI-Studio/Vertex-relabelled dataset.
Reproduced end-to-end with the project notebooks/scripts (retokenize.py + train_bert_run.py).
Base model: roberta-base · epochs: 5 · learning rate: 3e-5 · max_length 512 · stride 128 · seed 42.

split	precision	recall	F1
validation	—	—	0.6397
test	—	—	0.6563

O, B-JOB_TITLE, I-JOB_TITLE, B-SKILL, I-SKILL, B-EDUCATION, I-EDUCATION

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model