Enrollment Prediction Machine Learning Model This repository contains a machine learning model for predicting student enrollment based on a public dataset obtained from Kaggle. The dataset contains various features related to student demographics, academic performance, and economic factors.
Dataset The dataset consists of 34 columns and 4,882 rows. Each row represents a student and contains various features such as Marital status, Application mode, Application order, Course, Daytime/evening attendance, Previous qualification, Nacionality, Mother's qualification, Father's qualification, Mother's occupation, Father's occupation, Displaced, Educational special needs, Debtor, Tuition fees up to date, Gender, Scholarship holder, Age at enrollment, International, Curricular units 1st sem (credited), Curricular units 1st sem (enrolled), Curricular units 1st sem (evaluations), Curricular units 1st sem (approved), Curricular units 1st sem (grade), Curricular units 1st sem (without evaluations), Curricular units 2nd sem (credited), Curricular units 2nd sem (enrolled), Curricular units 2nd sem (evaluations), Curricular units 2nd sem (approved), Curricular units 2nd sem (grade), Curricular units 2nd sem (without evaluations), Unemployment rate, Inflation rate, and GDP.
The target column is "Target", which indicates whether a student dropped out or graduated.
The dataset can be found on Kaggle: https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention
Model The machine learning model uses a decision tree algorithm to predict student enrollment. The model has been trained on the dataset using 80% of the data for training and 20% for testing. The accuracy of the model is 85%.
Files
This repository contains the following files:
enrollment_prediction_model.ipynb: Jupyter notebook containing the code for training and testing the model
enrollment_prediction_model.pkl: Serialized machine learning model file
enrollment_prediction_model_readme.md: Readme file containing information about the machine learning model
Usage
To use the machine learning model, follow these steps:
Clone the repository
Install the required packages (pandas, numpy, scikit-learn)
Load the serialized machine learning model from the enrollment_prediction_model.pkl file
Prepare a new dataset with the same columns as the original dataset
Use the predict function of the model to predict enrollment for each row in the new dataset
Example code:
import pandas as pd
import pickle
Load serialized machine learning model
with open('enrollment_prediction_model.pkl', 'rb') as file:
model = pickle.load(file)
Prepare new dataset
new_data = pd.read_csv('new_data.csv')
Predict enrollment
predictions = model.predict(new_data)