Spaces:

Penguni
/

Analyze-and-predict-student-performance

Sleeping

App Files Files Community

Penguni commited on Jul 14, 2023

Commit

6293843

1 Parent(s): 51e6910

Upload 16 files

Browse files

Files changed (17) hide show

.gitattributes +1 -0
All_maj.csv +3 -0
R.png +0 -0
README.md +2 -13
__pycache__/function.cpython-310.pyc +0 -0
cols_to_drop.txt +13 -0
column_all.txt +78 -0
config.toml +2 -0
courses_list.txt +42 -0
dataScore.csv +0 -0
function.py +253 -0
main.py +310 -0
model/R_Late.joblib +3 -0
model/R_Sem.joblib +3 -0
model/R_rank.joblib +3 -0
requirements.txt +8 -0
rows_to_drop.txt +15 -0

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 Analyze-and-predict-student-scores/All_maj.csv filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 Analyze-and-predict-student-scores/All_maj.csv filter=lfs diff=lfs merge=lfs -text
+All_maj.csv filter=lfs diff=lfs merge=lfs -text

All_maj.csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9b5865503b770f70fe93a3a1b39e2d53036b45a7fa5c5a17cf226cf9bf0545f
+size 47671628

R.png ADDED Viewed

README.md CHANGED Viewed

@@ -1,13 +1,2 @@
----
-title: Analyze And Predict Student Performance
-emoji: 🏢
-colorFrom: pink
-colorTo: gray
-sdk: streamlit
-sdk_version: 1.21.0
-app_file: app.py
-pinned: false
-license: other
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


1	+ # Analyze-and-predict-student-performance
2	+ Link to web-app: https://itdsiu19001-analyze-and-predict-student-performance-main-oiibq6.streamlit.app/

__pycache__/function.cpython-310.pyc ADDED Viewed

Binary file (7.73 kB). View file

cols_to_drop.txt ADDED Viewed

	@@ -0,0 +1,13 @@

+Intensive English 0- Twinning Program
+Intensive English 01- Twinning Program
+Intensive English 02- Twinning Program
+Intensive English 03- Twinning Program
+Intensive English 1- Twinning Program
+Intensive English 2- Twinning Program
+Intensive English 3- Twinning Program
+Listening & Speaking IE1
+Listening & Speaking IE2
+Listening & Speaking IE2 (for twinning program)
+Reading & Writing IE1
+Reading & Writing IE2
+Reading & Writing IE2 (for twinning program)

column_all.txt ADDED Viewed

	@@ -0,0 +1,78 @@

+Algorithms & Data Structures
+Analytics for Observational Data
+Applied Artificial Intelligence
+Basic Electrical Concepts & Circuits
+Blockchain
+C/C++ Programming
+C/C++ Programming in Unix
+Circuit Analysis
+Communication Networks
+Computer Architecture
+Computer Graphics
+Computer Networks
+Data Analysis
+Data Science and Data Visualization
+Decision Support System
+Digital Communications
+Digital Image Processing
+Digital Logic Design
+Digital Logic Design Laboratory
+Digital Signal Processing
+Discrete Mathematics
+Electronic Devices & Circuits
+Embedded Systems
+Entrepreneurship
+Formal Programming Methods
+Functional Programming
+Fundamental Concepts of Data Security
+Fundamentals of Big Data Technology
+Fundamentals of Programming
+Human-Computer Interaction
+IT Project Management
+Information System Management
+Information Theory & Coding
+Internet of Things
+Internship
+Introduction to Artificial Intelligence
+Introduction to Computing
+Introduction to Data Mining
+Introduction to Data Science
+Introduction to Distributed Computing
+Introduction to Wireless Network
+Linear Algebra
+Micro-processing Systems
+Microprocessor Systems & Interfacing
+Mobile Application Development
+Net-Centric Programming
+Network Design and Evaluation
+Network Management and Protocols
+Network Programming
+Networks & Systems Security
+Object Oriented Data Engineering (Java)
+Object-Oriented Analysis and Design
+Object-Oriented Programming
+Operating Systems
+Optimization
+Principles of Database Management
+Principles of EE1
+Principles of EE1 Laboratory
+Principles of Programming Languages
+Probability, Statistic & Random Process
+Programming Languages & Translators
+Projects
+Regression Analysis
+Scalable and Distributed Computing
+Signals & Systems
+Signals & Systems Laboratory
+Skills for Communicating Information
+Software Architecture
+Software Engineering
+Software Implementation
+Special Study of the Field
+Statistical Methods
+System & Network Administration
+System and Network Security
+Theoretical Models in Computing
+Thesis
+Web Application Development
+Web Programming

config.toml ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [server]
2	+ headless = true

courses_list.txt ADDED Viewed

	@@ -0,0 +1,42 @@

+ Calculus 1
+ Calculus 2
+ Calculus 3
+ Chemistry Laboratory
+ Chemistry for Engineers
+ Critical Thinking
+ History of Vietnamese Communist Party
+ Internship
+ Philosophy of Marxism and Leninism
+ Physics 1
+ Physics 2
+ Physics 3
+ Physics 3 Laboratory
+ Physics 4
+ Political economics of Marxism and Leninism
+ Principles of Database Management
+ Principles of Marxism
+ Principles of Programming Languages
+ Probability, Statistic & Random Process
+ Regression Analysis
+ Revolutionary Lines of Vietnamese Communist Party
+ Scientific socialism
+ Speaking AE2
+ Special Study of the Field
+ Thesis
+ Writing AE1
+ Writing AE2
+ Intensive English 0- Twinning Program
+ Intensive English 01- Twinning Program
+ Intensive English 02- Twinning Program
+ Intensive English 03- Twinning Program
+ Intensive English 1- Twinning Program
+ Intensive English 2- Twinning Program
+ Intensive English 3- Twinning Program
+ Listening & Speaking IE1
+ Listening & Speaking IE2
+ Listening & Speaking IE2 (for twinning program)
+ Physical Training 1
+ Physical Training 2
+ Reading & Writing IE1
+ Reading & Writing IE2
+ Reading & Writing IE2 (for twinning program)

dataScore.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

function.py ADDED Viewed

	@@ -0,0 +1,253 @@

+import pandas as pd
+import numpy as np
+import plotly.express as px
+import plotly.graph_objs as go
+import streamlit as st
+import joblib
+def get_year(student_id):
+    return int(student_id[6:8])
+def process_data(raw_data):
+    # Pivot the DataFrame
+    pivot_df = pd.pivot_table(raw_data, values='DiemHP', index='MaSV', columns='TenMH', aggfunc='first')
+    pivot_df = pivot_df.reset_index().rename_axis(None, axis=1)
+    pivot_df.columns.name = None
+    pivot_df = pivot_df.dropna(thresh=50, axis=1)
+    pivot_df = pivot_df.rename(columns=lambda x: x.strip())
+    # Drop unnecessary columns
+    cols_to_drop = []
+    with open('cols_to_drop.txt', 'r') as f:
+      for line in f:
+        cols_to_drop.append(str(line.strip()))
+    existing_cols = [col for col in cols_to_drop if col in pivot_df.columns]
+    if existing_cols:
+        pivot_df.drop(existing_cols, axis=1, inplace=True)
+    # Merge with the XepLoaiNH column
+    df = pd.merge(pivot_df, raw_data[['MaSV', 'XepLoaiNH']], on='MaSV')
+    df.drop_duplicates(subset='MaSV', keep='last', inplace=True)
+    dfid=df['MaSV']
+    df.drop(['MaSV', 'XepLoaiNH'], axis=1, inplace=True)
+    df.replace(['WH', 'VT',"I"], np.nan, inplace=True)
+    df.iloc[:, :-1] = df.iloc[:, :-1].apply(pd.to_numeric)
+    df = pd.merge(dfid,df,left_index=True, right_index=True)
+    df['MaSV_school'] = df['MaSV'].str.slice(2, 4)
+    df['Major'] = df['MaSV'].str.slice(0, 2)
+    df["Year"] = 2000 + df["MaSV"].apply(get_year)
+    df["Year"]=df["Year"].astype(str)
+    df=df.drop(columns='MaSV')
+    return df
+def process_data_per(raw_data):
+    # Pivot the DataFrame
+    pivot_df = pd.pivot_table(raw_data, values='DiemHP', index='MaSV', columns='TenMH', aggfunc='first')
+    pivot_df = pivot_df.reset_index().rename_axis(None, axis=1)
+    pivot_df.columns.name = None
+    pivot_df = pivot_df.dropna(thresh=50, axis=1)
+    pivot_df = pivot_df.rename(columns=lambda x: x.strip())
+    # Drop unnecessary columns
+    cols_to_drop = []
+    with open('cols_to_drop.txt', 'r') as f:
+      for line in f:
+        cols_to_drop.append(str(line.strip()))
+    existing_cols = [col for col in cols_to_drop if col in pivot_df.columns]
+    if existing_cols:
+        pivot_df.drop(existing_cols, axis=1, inplace=True)
+    pivot_df.replace('WH', np.nan, inplace=True)
+    pivot_df.iloc[:, 1:] = pivot_df.iloc[:, 1:].apply(pd.to_numeric)
+    # Merge with the XepLoaiNH column
+    df = pd.merge(pivot_df, raw_data[['MaSV', 'XepLoaiNH']], on='MaSV')
+    df.drop_duplicates(subset='MaSV', keep='last', inplace=True)
+    df.drop(['XepLoaiNH'], axis=1, inplace=True)
+    return df
+def process_predict_data(raw_data):
+    dtk = raw_data[["MaSV", "DTBTKH4"]].copy()
+    dtk.drop_duplicates(subset="MaSV", keep="last", inplace=True)
+    count_duplicates = raw_data.groupby(["MaSV", "MaMH"]).size().reset_index(name="Times")
+    courses = raw_data[raw_data['MaMH'].str.startswith('IT')]
+    courses_list=courses['MaMH'].unique().tolist()
+  # Create two new columns for counting courses that are in the courses_list or not
+    count_duplicates["fail_courses_list"] = (
+        (count_duplicates["MaMH"].isin(courses_list)) & (count_duplicates["Times"] >= 2)
+    ).astype(int)
+    count_duplicates["fail_not_courses_list"] = (
+        (~count_duplicates["MaMH"].isin(courses_list)) & (count_duplicates["Times"] >= 2)
+    ).astype(int)
+    count_duplicates["pass_courses"] = (
+        (~count_duplicates["MaMH"].isin(courses_list)) & (count_duplicates["Times"] == 1)
+    ).astype(int)
+    # Group the data by "MaSV" and sum the counts for the two new columns
+    fail = (
+        count_duplicates.groupby("MaSV")[["fail_courses_list", "fail_not_courses_list"]]
+        .sum()
+        .reset_index()
+    )
+    # Rename the columns to reflect the split of courses_list and not courses_list
+    fail.columns = ["MaSV", "fail_courses_list_count", "fail_not_courses_list_count"]
+    df = pd.merge(dtk, fail, on="MaSV")
+    df = df.rename(columns={"DTBTKH4": "GPA"})
+    data = raw_data[['MaSV','NHHK','SoTCDat']]
+    data = data.drop_duplicates()
+    data = data.groupby(['MaSV'])['SoTCDat'].median().reset_index(name='Mean_Cre').round(2)
+    df = pd.merge(df, data, on='MaSV')
+    df1=raw_data[['MaSV','MaMH','NHHK']]
+    courses_list = raw_data[(raw_data['MaMH'].str.startswith('EN')) & ~(raw_data['MaMH'].str.contains('EN007|EN008|EN011|EN012'))].MaMH.tolist()
+    filtered_df = df1[df1['MaMH'].isin(courses_list)]
+    nhhk_counts = filtered_df.groupby('MaSV')['NHHK'].nunique().reset_index(name='EPeriod')
+    df = pd.merge(df, nhhk_counts, on='MaSV', how='left').fillna(0)
+    df=df[['MaSV','GPA'	,'Mean_Cre',	'fail_courses_list_count'	,'fail_not_courses_list_count'	,'EPeriod']]
+    return df
+def predict_late_student(test_df):
+    # Load the pre-trained model
+    model=joblib.load("model/R_Late.joblib")
+    model1=joblib.load("model/R_Sem.joblib")
+    # Process the student data
+    test_dfed = process_predict_data(test_df)
+    # Save the student ID column
+    std_id = test_dfed.iloc[:, 0]
+    # Drop the student ID column
+    test_dfed = test_dfed.drop(test_dfed.columns[0], axis=1)
+    # Make predictions using the pre-trained model
+    prediction = model.predict(test_dfed)
+    # Add a new column to the student data indicating if the student is late
+    prediction1 = model1.predict(test_dfed)
+    # Add a new column to the student data indicating if the student is late
+    test_dfed['Period'] = prediction1
+    test_dfed['Result'] = ['late' if p == 1 else 'not late' for p in prediction]
+    # Add the student ID column back to the beginning of the DataFrame
+    test_dfed.insert(0, 'MaSV', std_id)
+    for index, row in test_dfed.iterrows():
+      if row['Period'] <= 9 and row['Result'] == 'late':
+        test_dfed.loc[index, 'Period'] = row['Period'] / 2
+        test_dfed.loc[index, 'Result'] = 'may late'
+      else:
+        test_dfed.loc[index, 'Period'] = row['Period'] / 2
+    return test_dfed
+def predict_rank(raw_data):
+    # Pivot the DataFrame
+    raw_data = raw_data[raw_data["MaSV"].str.startswith("IT")]
+    raw_data = raw_data[raw_data['MaMH'].str.startswith('IT')]
+    pivot_df = pd.pivot_table(
+        raw_data, values="DiemHP", index="MaSV", columns="TenMH", aggfunc="first"
+    )
+    pivot_df = pivot_df.reset_index().rename_axis(None, axis=1)
+    pivot_df.columns.name = None
+    pivot_df = pivot_df.dropna(thresh=50, axis=1)
+    pivot_df = pivot_df.rename(columns=lambda x: x.strip())
+    pivot_df.replace("WH", np.nan, inplace=True)
+    pivot_df.iloc[:, 1:] = pivot_df.iloc[:, 1:].apply(pd.to_numeric)
+    # Merge with the XepLoaiNH column
+    df = pd.merge(pivot_df, raw_data[["MaSV", "DTBTK"]], on="MaSV")
+    df.drop_duplicates(subset="MaSV", keep="last", inplace=True)
+    col=df.drop(['MaSV', 'DTBTK'], axis=1)
+    columns_data = []
+    with open('column_all.txt', 'r') as f:
+      for line in f:
+        columns_data.append(str(line.strip()))
+    r=df.drop(columns=['MaSV','DTBTK'])
+    merge=r.columns.tolist()
+    dup=pd.DataFrame(columns=columns_data)
+    df= pd.merge(dup, df, on=merge, how='outer')
+    for col in df.columns:
+          if df[col].isnull().values.any():
+            df[col].fillna(value=df["DTBTK"], inplace=True)
+    std_id = df['MaSV'].copy()
+    df=df.drop(['MaSV', 'DTBTK'], axis=1)
+    df.sort_index(axis=1, inplace=True)
+    model=joblib.load("model/R_rank.joblib")
+    prediction = model.predict(df)
+    df['Pred Rank'] = prediction
+    df.insert(0, 'MaSV', std_id)
+    df=df[['MaSV','Pred Rank']]
+    return df
+def predict_one_student(raw_data, student_id):
+    # Subset the DataFrame to relevant columns and rows
+      student = process_data_per(raw_data)
+      filtered_df = student[student["MaSV"] == student_id]
+      if len(filtered_df) > 0:
+        selected_row = filtered_df.iloc[0, 1:].dropna()
+        colname = filtered_df.dropna().columns.tolist()
+        values = selected_row.values.tolist()
+      # create a line chart using plotly
+        fig1 = go.Figure()
+        fig1.add_trace(go.Histogram(x=values, nbinsx=40, name=student_id,marker=dict(color='rgba(50, 100, 200, 0.7)')))
+        # set the chart title and axis labels
+        fig1.update_layout(
+            title="Histogram for student {}".format(student_id),
+            xaxis_title="Value",
+            yaxis_title="Frequency",
+            width=500
+        )
+        # create a bar chart using plotly express
+        data = raw_data[['MaSV', 'NHHK', 'TenMH', 'DiemHP']]
+        data['TenMH'] = data['TenMH'].str.lstrip()
+        data['NHHK'] = data['NHHK'].apply(lambda x: str(x)[:4] + ' S ' + str(x)[4:])
+        rows_to_drop = []
+        with open('rows_to_drop.txt', 'r') as f:
+            for line in f:
+                rows_to_drop.append(str(line.strip()))
+        data = data[~data['TenMH'].isin(rows_to_drop)]
+        student_data = data[data['MaSV'] == student_id][['NHHK', 'TenMH', 'DiemHP']]
+        student_data['DiemHP'] = pd.to_numeric(student_data['DiemHP'], errors='coerce')
+        fig2 = px.bar(student_data, x='TenMH', y='DiemHP', color='NHHK', title='Student Score vs. Course')
+        fig2.update_layout(
+            title="Student Score vs. Course",
+            xaxis_title=None,
+            yaxis_title="Score",
+        )
+        fig2.add_shape(
+            type="line",
+            x0=0,
+            y0=50,
+            x1=len(student_data['TenMH'])-1,
+            y1=50,
+            line=dict(color='red', width=3)
+        )
+        # display the charts using st.column
+        col1, col2 = st.columns(2)
+        with col1:
+            st.plotly_chart(fig1)
+        with col2:
+            st.plotly_chart(fig2)
+      else:
+        st.write("No data found for student {}".format(student_id))

main.py ADDED Viewed

	@@ -0,0 +1,310 @@

+import pandas as pd
+import streamlit as st
+import plotly.express as px
+import numpy as np
+import plotly.graph_objs as go
+from function import process_data,predict_late_student, predict_rank,predict_one_student
+from datetime import datetime
+from PIL import Image
+import base64
+from io import BytesIO
+df = pd.DataFrame()
+def color_cell(val):
+    if val == "not late":
+        color = "green"
+    elif val == "may late":
+        color = "yellow"
+    elif val == "late":
+        color = "red"
+    else:
+        color = "black"
+    return "color: %s" % color
+def get_year(student_id):
+    return int(student_id[6:8])
+def generate_comment(median):
+    if median < 30:
+        comment = f"The median score for {course} is quite low at {median}. Students may need to work harder to improve their performance."
+    elif median < 50:
+        comment = f"The median score for {course} is below average at {median}. Students should work on improving their understanding of the material."
+    elif median < 80:
+        comment = f"The median score for {course} is solid at {median}. Students are making good progress but could still work on improving their skills."
+    else:
+        comment = f"The median score for {course} is outstanding at {median}. Students are doing an excellent job in this course."
+    return comment
+favicon = 'R.png'
+st.set_page_config(
+page_title='Student System',
+page_icon=favicon,
+layout='wide',
+)
+currentYear = datetime.now().year
+im1 = Image.open("R.png")
+# get the image from the URL
+# create a three-column layout
+col1, col2 = st.columns([1, 3])
+# add a centered image to the first and third columns
+with col1:
+    st.image(im1, width=150)
+# add a centered title to the second column
+with col2:
+    st.title("Student Performance Prediction System")
+# Load the raw data
+# uploaded_file = st.file_uploader("Choose a score file", type=["xlsx", "csv"])
+# if uploaded_file is not None:
+#     file_contents = uploaded_file.read()
+#     file_ext = uploaded_file.name.split(".")[-1].lower()  # Get the file extension
+#     if file_ext == "csv":
+#         df = pd.read_csv(BytesIO(file_contents))
+#     elif file_ext in ["xls", "xlsx"]:
+#         df = pd.read_excel(BytesIO(file_contents))
+#     else:
+#         st.error("Invalid file format. Please upload a CSV or Excel file.")
+# raw_data = df.copy()
+raw_data = pd.read_csv("All_major.csv")
+st.sidebar.title("Analysis Tool")
+option = ["Dashboard", "Predict"]
+# Add an expander to the sidebar
+tabs = st.sidebar.selectbox("Select an option", option)
+# draw histogram
+# Streamlit app
+if tabs == "Dashboard":
+#     try:
+        df = process_data(raw_data)
+        unique_values_major = df["Major"].unique()
+        major=st.selectbox("Select a major:", unique_values_major)
+        if major == "All":
+        # If so, display the entire DataFrame
+          filtered_df = df.copy()
+        else:
+        # Otherwise, filter the DataFrame based on the selected value
+          filtered_df = df[df["Major"] == major]
+          filtered_df  = filtered_df.dropna(axis=1, how="all")
+        # Select course dropdown
+        df=filtered_df
+        unique_values = df["MaSV_school"].unique()
+        all_values = np.concatenate([["All"],unique_values ])
+        school = st.selectbox("Select a school:", all_values)
+        if school == "All":
+        # If so, display the entire DataFrame
+          filtered_df = df.copy()
+        else:
+        # Otherwise, filter the DataFrame based on the selected value
+          filtered_df = df[df["MaSV_school"] == school]
+          filtered_df  = filtered_df.dropna(axis=1, how="all")
+        # Select course dropdown
+        df=filtered_df
+        unique_values_year = df["Year"].unique()
+        all_values_year = np.concatenate([["All"],unique_values_year ])
+        year = st.selectbox("Select a year:", all_values_year)
+        if year == "All":
+            # If so, display the entire DataFrame
+            filtered_df = df.copy()
+        else:
+            # Otherwise, filter the DataFrame based on the selected value
+            filtered_df = df[df["Year"] == year]
+            filtered_df = filtered_df.dropna(axis=1, how="all")
+        df=filtered_df
+        options = df.columns[:-3]
+        course = st.selectbox("Select a course:", options)
+        # Filter the data for the selected course
+        course_data = df[course].dropna()
+        # Calculate summary statistics for the course
+        st.write(generate_comment(course_data.median()))
+        # Show summary statistics
+        st.write("Course:", course, " of ", school," student" )
+        col1, col2,col3= st.columns(3)
+        with col1:
+            fig = go.Figure()
+            fig.add_trace(
+                go.Histogram(
+                    x=course_data, nbinsx=40, name="Histogram"
+                )
+            )
+            fig.update_layout(
+                title="Histogram of Scores for {}".format(course),
+                xaxis_title="Score",
+                yaxis_title="Count",
+                height=400,
+                width=400
+            )
+            st.plotly_chart(fig)
+        with col2:
+            fig = go.Figure()
+            fig.add_trace(
+                go.Box(
+                    y=course_data, name="Box plot"
+                )
+            )
+            fig.update_layout(
+                title="Box plot of Scores for {}".format(course),
+                yaxis_title="Score",
+                height=400,
+                width=400
+            )
+            st.plotly_chart(fig)
+        with col3:
+            raw_data['MaSV_school'] = raw_data['MaSV'].str.slice(2, 4)
+            if school == "All":
+        # If so, display the entire DataFrame
+                data = raw_data.copy()
+            else:
+        # Otherwise, filter the DataFrame based on the selected value
+                data = raw_data[raw_data["MaSV_school"] == school]
+            df1=data[['TenMH','NHHK','DiemHP']].copy()
+            df1['DiemHP'] = pd.to_numeric(df1['DiemHP'], errors='coerce')
+            df1['NHHK'] = df1['NHHK'].apply(lambda x: str(x)[:4] + ' S ' + str(x)[4:])
+            selected_TenMH = " " + course
+            filtered_df1 = df1[df1['TenMH'] == selected_TenMH]
+            mean_DiemHP = filtered_df1.groupby('NHHK')['DiemHP'].mean().round(1).reset_index(name='Mean')
+            # Create Plotly line graph
+            fig = px.line(mean_DiemHP, x='NHHK', y='Mean', title=f"Mean DiemHP for{selected_TenMH} thought period")
+            fig.update_layout(
+              height=400,
+              width=400)
+            st.plotly_chart(fig)
+#     except:
+#         st.write("Add CSV to analysis")
+# predict student
+elif tabs == "Predict":
+    try:
+        raw_data = pd.read_csv("dataScore.csv")
+        predict = predict_late_student(raw_data)
+        rank = predict_rank(raw_data)
+        predict = pd.merge(predict, rank, on="MaSV")
+        rank_mapping = {
+            "Khá": "Good",
+            "Trung Bình Khá": "Average good",
+            "Giỏi": "Very good",
+            "Kém": "Very weak",
+            "Trung Bình": "Ordinary",
+            "Yếu": "Weak",
+            "Xuất Sắc": "Excellent",
+        }
+        predict["Pred Rank"].replace(rank_mapping, inplace=True)
+        # Filter students who have a Result value of "late"
+        df_late = predict
+        MaSV = st.text_input("Enter Student ID:")
+        if MaSV:
+            df_filtered = predict[predict["MaSV"] == MaSV]
+            styled_table = (
+                df_filtered[["MaSV", "GPA", "Mean_Cre", "Pred Rank", "Result", "Period"]]
+                .style.applymap(color_cell)
+                .format({"GPA": "{:.2f}", "Mean_Cre": "{:.1f}", "Period": "{:.1f}"})
+            )
+            with st.container():
+                st.write(styled_table)
+                predict_one_student(raw_data,MaSV)
+        else:
+            df_late = predict
+            # df_late = predict[(predict['Pred Rank'] == 'Yếu') | (predict['Pred Rank'] == 'Kém')]
+            df_late["Year"] = 2000 + df_late["MaSV"].apply(get_year)
+            df_late = df_late[
+                (df_late["Year"] != currentYear - 1) & (df_late["Year"] != currentYear - 2)
+            ]
+            year = st.selectbox("Select Year", options=df_late["Year"].unique())
+            df_filtered = df_late[df_late["Year"] == year]
+            styled_table = (
+                df_filtered[["MaSV", "GPA", "Mean_Cre", "Pred Rank", "Result", "Period"]]
+                .style.applymap(color_cell)
+                .format({"GPA": "{:.2f}", "Mean_Cre": "{:.2f}", "Period": "{:.2f}"})
+            )
+            csv = df_filtered.to_csv(index=False)
+            b64 = base64.b64encode(csv.encode()).decode()
+            href = f'<a href="data:file/csv;base64,{b64}" download="Preidct data.csv">Download CSV</a>'
+            st.markdown(href, unsafe_allow_html=True)
+            fig1 = px.pie(
+                df_filtered,
+                names="Pred Rank",
+                title="Pred Rank",
+                color_discrete_sequence=px.colors.sequential.Mint,
+                height=400,
+                width=400,
+            )
+            fig2 = px.pie(
+                df_filtered,
+                names="Result",
+                title="Result",
+                color_discrete_sequence=px.colors.sequential.Peach,
+                height=400,
+                width=400,
+            )
+            fig1.update_layout(
+                title={
+                    "text": "Pred Rank",
+                    "y": 0.95,
+                    "x": 0.5,
+                    "xanchor": "center",
+                    "yanchor": "top",
+                }
+            )
+            fig2.update_layout(
+                title={
+                    "text": "Result",
+                    "y": 0.95,
+                    "x": 0.5,
+                    "xanchor": "center",
+                    "yanchor": "top",
+                }
+            )
+            st.dataframe(styled_table)
+            col1, col2 = st.columns([1, 1])
+            with col1:
+                st.plotly_chart(fig1)
+            with col2:
+                st.plotly_chart(fig2)
+        # display the grid of pie charts using Streamlit
+    except:
+        st.write('Add CSV to analysis')

model/R_Late.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d201e84514a400d73d79097b43fabf12cc96923e7abb1bc5c3be22bc5dea7445
+size 497289

model/R_Sem.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:adfbee871506a3a7e6e3ca02d7bd205cceab50fdbec47878d01773ed59dd5e7c
+size 2638353

model/R_rank.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89b2a11b69b622db9e2401c735d0bb0b4a5f791269de30c9799a0817f619cd96
+size 205089

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+numpy
+Cython==0.29.21
+scikit-learn
+pandas
+plotly
+scipy
+pyDOE
+openpyxl

rows_to_drop.txt ADDED Viewed

	@@ -0,0 +1,15 @@

+ Intensive English 0- Twinning Program
+ Intensive English 01- Twinning Program
+ Intensive English 02- Twinning Program
+ Intensive English 03- Twinning Program
+ Intensive English 1- Twinning Program
+ Intensive English 2- Twinning Program
+ Intensive English 3- Twinning Program
+ Listening & Speaking IE1
+ Listening & Speaking IE2
+ Listening & Speaking IE2 (for twinning program)
+ Physical Training 1
+ Physical Training 2
+ Reading & Writing IE1
+ Reading & Writing IE2
+ Reading & Writing IE2 (for twinning program)