Details

Background

In July and August 2022, I researched with a professor at UMBC in the Department of Computer Science on basic natural language processing. I learned through the fastai fastbook and our task was to create a resume classifier. The professor found a dataset of resumes online and gave me the task to manually label each text file as a resume or not (2-resume, 1-kind of, 0-not a resume). After that, I learned through fastai and under the guidance of the professor on how to train the model. I trained it many times but not continuously so I needed to learn how to freeze and unfreeze the model. I also trained over night for a couple of days and reached an accuracy of 90%.

Recently, I looked back on this project and wanted to make it a little more official by creating a small testing interface program and by uploading it onto github/huggingface.

Files

Here are the files you'll find in this repository

resume_learner.pth

This is the file of the trained model

main.ipynb

This is the jupyter notebook on loading the model and running specific tests on it

test.txt

This is a file to feed into the model in main.ipynb if you want to copy paste a large chunk of text

Observations

In all honesty, this is not a very good model but it provided the basics for me on how to create a language learning model. I will say it successfully predicts resumes pretty well, but some weird cases where it doesn't is when it sees texts like

  • "hi"
  • "this is not a resume"

Things like this because they are very short files.

However, I believe this is because the training data was mainly resumes, so it can classify whether a text file is a resume. There wasn't much data showing whether a text file was not a resume so the model could not determine that very well.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.