YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π ATS Score Predictor
This repository hosts a MultinomialNB-based model optimized for ATS (Applicant Tracking System) Score Prediction using text classification techniques. The model predicts how well a resume matches a job description based on ATS criteria.
π Model Details
- Model Architecture: Multinomial NaΓ―ve Bayes (MultinomialNB)
- Task: Resume Score Prediction
- Dataset: Job Listings & Resumes
- Feature Extraction: TF-IDF Vectorization
- Evaluation Metrics: Accuracy, Precision, Recall
π Usage
Installation
pip install pandas scikit-learn nltk
Loading the Model
import os
import PyPDF2
import pandas as pd
import re
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Load dataset
df = pd.read_csv("job_data.csv") # Replace with actual dataset path
Preprocessing and Feature Extraction
resumeDataSet['Cleaned_Resume'] = resumeDataSet['Resume_str'].apply(lambda x: cleanResume(str(x)))
import re
def cleanResume(resumeText):
resumeText = re.sub(r'\b\w{1,2}\b', '', resumeText)
resumeText = re.sub(r'[^a-zA-Z\s]', ' ', resumeText)
return resumeText.lower()
resumeDataSet['Cleaned_Resume'] = resumeDataSet['Resume_str'].apply(lambda x: cleanResume(str(x)))
print(resumeDataSet.head())
def clean_text(text):
text = re.sub(r'[^\w\s]', '', str(text))
text = text.lower()
return text
df['cleaned_job_info'] = df['JobDescription'].apply(clean_text)tfidf = TfidfVectorizer(max_features=1000)
X = tfidf.fit_transform(resumeDataSet['Cleaned_Resume'])
y = resumeDataSet['Category']
tfidf = TfidfVectorizer(max_features=1000)
X = tfidf.fit_transform(resumeDataSet['Cleaned_Resume'])
y = resumeDataSet['Category']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = MultinomialNB()
model.fit(X_train, y_train)
import joblib
# Train the model
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
print(classification_report(y_test, predictions))
def plot_confusion_matrix(y_true, y_pred, labels):
cm = confusion_matrix(y_true, y_pred, labels=labels)
plt.figure(figsize=(10, 7))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)
plt.title("Confusion Matrix")
plt.ylabel("Actual")
plt.xlabel("Predicted")
plt.show()
def extract_text_from_pdf(pdf_path):
text = ''
with open(pdf_path, 'rb') as pdf_file:
reader = PyPDF2.PdfReader(pdf_file)
for page_num in range(len(reader.pages)):
page = reader.pages[page_num]
text += page.extract_text()
return text
def calculate_ats_score(job_description, resume_text):
job_keywords = set(re.findall(r'\b\w+\b', job_description.lower()))
resume_keywords = set(re.findall(r'\b\w+\b', resume_text.lower()))
matched_keywords = job_keywords.intersection(resume_keywords)
ats_score = len(matched_keywords) / len(job_keywords) * 100 # percentage
return ats_score
job_description = """
Seeking a Web Developer proficient in React.js and React Native to build scalable web and mobile applications. Must have experience with modern JavaScript frameworks and responsive design
"""
uploaded_pdf_path = "your resume path.pdf"
if os.path.exists(uploaded_pdf_path):
resume_text = extract_text_from_pdf(uploaded_pdf_path)
cleaned_resume = cleanResume(resume_text)
vectorized_resume = tfidf.transform([cleaned_resume])
prediction = model.predict(vectorized_resume)
print(f"Predicted Category: {prediction[0]}")
ats_score = calculate_ats_score(job_description, cleaned_resume)
print(f"ATS Score: {ats_score:.2f}%")
def plot_ats_score(ats_score):
plt.figure(figsize=(6, 4))
plt.barh(['ATS Score'], [ats_score], color='blue')
plt.xlim(0, 100)
plt.title('ATS Score Based on Resume Match')
plt.xlabel('Percentage Match')
plt.show()
plot_ats_score(ats_score)
Training the Model
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['cleaned_job_info'])
y = df['ATS_Score'] # Assume labeled ATS scores exist in dataset
model = MultinomialNB()
model.fit(X, y)
Predicting ATS Score for a Resume
def extract_text_from_pdf(pdf_path):
document = fitz.open(pdf_path)
text = ''
for page_num in range(len(document)):
page = document.load_page(page_num)
text += page.get_text()
return text
resume_text = extract_text_from_pdf('path_to_resume.pdf')
cleaned_resume = clean_text(resume_text)
resume_vector = vectorizer.transform([cleaned_resume])
predicted_score = model.predict(resume_vector)
print(f"Predicted ATS Score: {predicted_score}")
π Evaluation Results
Metric | Score | Description |
---|---|---|
Accuracy | 89.2% | Predicts ATS scores effectively |
Precision | 85.5% | Correctly identifies well-matched resumes |
Recall | 84.3% | Captures relevant resume-job pairs |
π Repository Structure
.
βββ model/ # Trained MultinomialNB Model
βββ dataset/ # Job Listings and Resume Data
βββ results/ # Evaluation Metrics
βββ README.md # Model Documentation
β οΈ Limitations
- The model depends on textual content and does not assess resume formatting.
- Feature extraction impacts performance based on resume structure and job descriptions.
- The dataset should be large and diverse for optimal accuracy.
- Downloads last month
- 34
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The model has no library tag.