|
import streamlit as st |
|
|
|
def run(): |
|
st.title("Data Understanding") |
|
|
|
st.write("## Overview") |
|
st.write(""" |
|
Data Understanding is the second phase of the CRISP-DM process. It involves collecting initial data, describing the data, exploring the data, and verifying data quality. |
|
""") |
|
|
|
st.write("## Key Concepts & Explanations") |
|
st.markdown(""" |
|
- **Data Collection**: Gathering data from various sources. |
|
- **Data Description**: Summarizing the main characteristics of the data. |
|
- **Data Exploration**: Using statistical and visualization techniques to understand the data. |
|
- **Data Quality Verification**: Ensuring the data is accurate, complete, and reliable. |
|
""") |
|
|
|
st.write("## Introduction") |
|
st.write(""" |
|
The Data Understanding phase is crucial for identifying potential issues with the data and gaining insights that will inform the subsequent phases of the CRISP-DM process. |
|
""") |
|
|
|
st.header("Objectives") |
|
st.write(""" |
|
- **Collect Initial Data**: Gather data from various sources to get a comprehensive dataset. |
|
- **Describe the Data**: Summarize the main characteristics of the data, including its structure and content. |
|
- **Explore the Data**: Use statistical and visualization techniques to identify patterns, trends, and anomalies. |
|
- **Verify Data Quality**: Assess the quality of the data to ensure it is suitable for analysis. |
|
""") |
|
|
|
st.header("Key Activities") |
|
st.write(""" |
|
- **Data Collection**: Gather data from internal and external sources. |
|
- **Data Description**: Generate summary statistics and visualizations to describe the data. |
|
- **Data Exploration**: Perform exploratory data analysis (EDA) to uncover patterns and relationships. |
|
- **Data Quality Verification**: Check for missing values, outliers, and inconsistencies in the data. |
|
""") |
|
|
|
st.write("## Detailed Steps") |
|
st.write(""" |
|
1. **Collect Initial Data**: |
|
- Identify relevant data sources. |
|
- Extract data from various sources and consolidate it into a single dataset. |
|
2. **Describe the Data**: |
|
- Generate summary statistics (e.g., mean, median, standard deviation). |
|
- Create visualizations (e.g., histograms, box plots) to describe the data distribution. |
|
3. **Explore the Data**: |
|
- Perform exploratory data analysis (EDA) to identify patterns, trends, and anomalies. |
|
- Use visualization tools (e.g., scatter plots, heatmaps) to explore relationships between variables. |
|
4. **Verify Data Quality**: |
|
- Check for missing values and handle them appropriately. |
|
- Identify and address outliers and inconsistencies in the data. |
|
- Assess the overall quality of the data to ensure it is suitable for analysis. |
|
""") |
|
|
|
st.write("## Quiz: Conceptual Questions") |
|
q1 = st.radio("What is the main purpose of the Data Understanding phase?", ["Collect data", "Describe data", "Explore data", "All of the above"]) |
|
if q1 == "All of the above": |
|
st.success("β
Correct!") |
|
else: |
|
st.error("β Incorrect. The main purpose is to collect, describe, and explore data.") |
|
|
|
st.write("## Learning Resources") |
|
st.markdown(""" |
|
- π [CRISP-DM Guide](https://www.sv-europe.com/crisp-dm-methodology/) |
|
- π [Data Understanding in Data Science](https://towardsdatascience.com/data-understanding-in-data-science-1a1d5e8b1c3d) |
|
- π¬ [Exploratory Data Analysis (EDA)](https://www.analyticsvidhya.com/blog/2021/06/exploratory-data-analysis-eda-a-step-by-step-guide/) |
|
""") |
|
|
|
|