How to use

In this example, we do an inference on a sample from our dataset (ResumeAtlas). You can increase max_length for more accurate predictions.

!pip install datasets

import numpy as np
import torch
from transformers import BertForSequenceClassification, BertTokenizer
from datasets import load_dataset
from sklearn import preprocessing

dataset_id='ahmedheakl/resume-atlas'
model_id='ahmedheakl/bert-resume-classification'
label_column = "Category"
num_labels=43
output_attentions=False
output_hidden_states=False
do_lower_case=True
add_special_tokens=True
max_length=512
pad_to_max_length=True
return_attention_mask=True
truncation=True

ds = load_dataset(dataset_id, trust_remote_code=True)

le = preprocessing.LabelEncoder()
le.fit(ds['train'][label_column])


tokenizer = BertTokenizer.from_pretrained(model_id, do_lower_case=do_lower_case)
model = BertForSequenceClassification.from_pretrained(
    model_id,
    num_labels = num_labels,
    output_attentions = output_attentions,
    output_hidden_states = output_hidden_states,
)

model = model.to('cuda').eval()
sent = ds['train'][0]['Text']

encoded_dict = tokenizer.encode_plus(
    sent,
    add_special_tokens=add_special_tokens,
    max_length=max_length,
    pad_to_max_length=pad_to_max_length,
    return_attention_mask=return_attention_mask,
    return_tensors='pt',
    truncation=truncation,
)
input_ids = encoded_dict['input_ids'].to('cuda')
attention_mask = encoded_dict['attention_mask'].to('cuda')

outputs = model(
    input_ids,
    token_type_ids=None,
    attention_mask=attention_mask
)
    
label_id = np.argmax(outputs['logits'].cpu().detach().tolist(), axis=1)
print(f'Predicted: {le.inverse_transform(label_id)[0]} | Ground: {ds["train"][0][label_column]}')

Model Card for Model ID

Please see paper & code for more information:

Citation

BibTeX:

@article{heakl2024resumeatlas,
  title={ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models},
  author={Heakl, Ahmed and Mohamed, Youssef and Mohamed, Noran and Sharkaway, Ali and Zaky, Ahmed},
  journal={arXiv preprint arXiv:2406.18125},
  year={2024}
}

APA:

Heakl, A., Mohamed, Y., Mohamed, N., Sharkaway, A., & Zaky, A. (2024). ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets   and Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.18125

Model Card Authors [optional]

Email: ahmed.heakl@ejust.edu.eg Linkedin: https://linkedin.com/in/ahmed-heakl

Downloads last month
343
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ahmedheakl/bert-resume-classification

Space using ahmedheakl/bert-resume-classification 1

Collection including ahmedheakl/bert-resume-classification