YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ“„ ATS Score Predictor

This repository hosts a MultinomialNB-based model optimized for ATS (Applicant Tracking System) Score Prediction using text classification techniques. The model predicts how well a resume matches a job description based on ATS criteria.

πŸ“Œ Model Details

  • Model Architecture: Multinomial NaΓ―ve Bayes (MultinomialNB)
  • Task: Resume Score Prediction
  • Dataset: Job Listings & Resumes
  • Feature Extraction: TF-IDF Vectorization
  • Evaluation Metrics: Accuracy, Precision, Recall

πŸš€ Usage

Installation

pip install pandas scikit-learn nltk

Loading the Model

import os
import PyPDF2
import pandas as pd
import re
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset
df = pd.read_csv("job_data.csv")  # Replace with actual dataset path

Preprocessing and Feature Extraction

resumeDataSet['Cleaned_Resume'] = resumeDataSet['Resume_str'].apply(lambda x: cleanResume(str(x)))

import re
def cleanResume(resumeText):
    resumeText = re.sub(r'\b\w{1,2}\b', '', resumeText)  
    resumeText = re.sub(r'[^a-zA-Z\s]', ' ', resumeText) 
    return resumeText.lower()

resumeDataSet['Cleaned_Resume'] = resumeDataSet['Resume_str'].apply(lambda x: cleanResume(str(x)))

print(resumeDataSet.head())

def clean_text(text):
    text = re.sub(r'[^\w\s]', '', str(text))  
    text = text.lower()
    return text

df['cleaned_job_info'] = df['JobDescription'].apply(clean_text)tfidf = TfidfVectorizer(max_features=1000)
X = tfidf.fit_transform(resumeDataSet['Cleaned_Resume'])
y = resumeDataSet['Category']

tfidf = TfidfVectorizer(max_features=1000)
X = tfidf.fit_transform(resumeDataSet['Cleaned_Resume'])
y = resumeDataSet['Category']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = MultinomialNB()
model.fit(X_train, y_train)
import joblib

# Train the model
from sklearn.naive_bayes import MultinomialNB

model = MultinomialNB()
model.fit(X_train, y_train)

predictions = model.predict(X_test)

accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
print(classification_report(y_test, predictions))

def plot_confusion_matrix(y_true, y_pred, labels):
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    plt.figure(figsize=(10, 7))
    sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)
    plt.title("Confusion Matrix")
    plt.ylabel("Actual")
    plt.xlabel("Predicted")
    plt.show()

def extract_text_from_pdf(pdf_path):
    text = ''
    with open(pdf_path, 'rb') as pdf_file:
        reader = PyPDF2.PdfReader(pdf_file)
        for page_num in range(len(reader.pages)):
            page = reader.pages[page_num]
            text += page.extract_text()
    return text

def calculate_ats_score(job_description, resume_text):
    job_keywords = set(re.findall(r'\b\w+\b', job_description.lower()))
    resume_keywords = set(re.findall(r'\b\w+\b', resume_text.lower()))
    
    matched_keywords = job_keywords.intersection(resume_keywords)
    ats_score = len(matched_keywords) / len(job_keywords) * 100  # percentage
    return ats_score

job_description = """
Seeking a Web Developer proficient in React.js and React Native to build scalable web and mobile applications. Must have experience with modern JavaScript frameworks and responsive design
"""
uploaded_pdf_path = "your resume path.pdf"  

if os.path.exists(uploaded_pdf_path):
    resume_text = extract_text_from_pdf(uploaded_pdf_path)
    cleaned_resume = cleanResume(resume_text)
    vectorized_resume = tfidf.transform([cleaned_resume])
prediction = model.predict(vectorized_resume)
print(f"Predicted Category: {prediction[0]}")
ats_score = calculate_ats_score(job_description, cleaned_resume)
print(f"ATS Score: {ats_score:.2f}%")
def plot_ats_score(ats_score):
    plt.figure(figsize=(6, 4))
    plt.barh(['ATS Score'], [ats_score], color='blue')
    plt.xlim(0, 100)
    plt.title('ATS Score Based on Resume Match')
    plt.xlabel('Percentage Match')
    plt.show()

plot_ats_score(ats_score)

Training the Model

vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['cleaned_job_info'])
y = df['ATS_Score']  # Assume labeled ATS scores exist in dataset

model = MultinomialNB()
model.fit(X, y)

Predicting ATS Score for a Resume

def extract_text_from_pdf(pdf_path):
    document = fitz.open(pdf_path)
    text = ''
    for page_num in range(len(document)):
        page = document.load_page(page_num)
        text += page.get_text()
    return text

resume_text = extract_text_from_pdf('path_to_resume.pdf')
cleaned_resume = clean_text(resume_text)
resume_vector = vectorizer.transform([cleaned_resume])
predicted_score = model.predict(resume_vector)
print(f"Predicted ATS Score: {predicted_score}")

πŸ“Š Evaluation Results

Metric Score Description
Accuracy 89.2% Predicts ATS scores effectively
Precision 85.5% Correctly identifies well-matched resumes
Recall 84.3% Captures relevant resume-job pairs

πŸ“‚ Repository Structure

.
β”œβ”€β”€ model/               # Trained MultinomialNB Model
β”œβ”€β”€ dataset/             # Job Listings and Resume Data
β”œβ”€β”€ results/             # Evaluation Metrics
β”œβ”€β”€ README.md            # Model Documentation

⚠️ Limitations

  • The model depends on textual content and does not assess resume formatting.
  • Feature extraction impacts performance based on resume structure and job descriptions.
  • The dataset should be large and diverse for optimal accuracy.
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using AventIQ-AI/multinomialnb-ats-score-predictor 1