Spaces:

georgeek
/

LLM-Tutor

Sleeping

App Files Files Community

LLM-Tutor / pages /2_Data_understanding.py

georgeek

setup

de2b822 6 months ago

raw

history blame contribute delete

3.59 kB

	import streamlit as st

	def run():
	st.title("Data Understanding")

	st.write("## Overview")
	st.write("""
	Data Understanding is the second phase of the CRISP-DM process. It involves collecting initial data, describing the data, exploring the data, and verifying data quality.
	""")

	st.write("## Key Concepts & Explanations")
	st.markdown("""
	- Data Collection: Gathering data from various sources.
	- Data Description: Summarizing the main characteristics of the data.
	- Data Exploration: Using statistical and visualization techniques to understand the data.
	- Data Quality Verification: Ensuring the data is accurate, complete, and reliable.
	""")

	st.write("## Introduction")
	st.write("""
	The Data Understanding phase is crucial for identifying potential issues with the data and gaining insights that will inform the subsequent phases of the CRISP-DM process.
	""")

	st.header("Objectives")
	st.write("""
	- Collect Initial Data: Gather data from various sources to get a comprehensive dataset.
	- Describe the Data: Summarize the main characteristics of the data, including its structure and content.
	- Explore the Data: Use statistical and visualization techniques to identify patterns, trends, and anomalies.
	- Verify Data Quality: Assess the quality of the data to ensure it is suitable for analysis.
	""")

	st.header("Key Activities")
	st.write("""
	- Data Collection: Gather data from internal and external sources.
	- Data Description: Generate summary statistics and visualizations to describe the data.
	- Data Exploration: Perform exploratory data analysis (EDA) to uncover patterns and relationships.
	- Data Quality Verification: Check for missing values, outliers, and inconsistencies in the data.
	""")

	st.write("## Detailed Steps")
	st.write("""
	1. Collect Initial Data:
	- Identify relevant data sources.
	- Extract data from various sources and consolidate it into a single dataset.
	2. Describe the Data:
	- Generate summary statistics (e.g., mean, median, standard deviation).
	- Create visualizations (e.g., histograms, box plots) to describe the data distribution.
	3. Explore the Data:
	- Perform exploratory data analysis (EDA) to identify patterns, trends, and anomalies.
	- Use visualization tools (e.g., scatter plots, heatmaps) to explore relationships between variables.
	4. Verify Data Quality:
	- Check for missing values and handle them appropriately.
	- Identify and address outliers and inconsistencies in the data.
	- Assess the overall quality of the data to ensure it is suitable for analysis.
	""")

	st.write("## Quiz: Conceptual Questions")
	q1 = st.radio("What is the main purpose of the Data Understanding phase?", ["Collect data", "Describe data", "Explore data", "All of the above"])
	if q1 == "All of the above":
	st.success("✅ Correct!")
	else:
	st.error("❌ Incorrect. The main purpose is to collect, describe, and explore data.")

	st.write("## Learning Resources")
	st.markdown("""
	- 📘 [CRISP-DM Guide](https://www.sv-europe.com/crisp-dm-methodology/)
	- 🎓 [Data Understanding in Data Science](https://towardsdatascience.com/data-understanding-in-data-science-1a1d5e8b1c3d)
	- 🔬 [Exploratory Data Analysis (EDA)](https://www.analyticsvidhya.com/blog/2021/06/exploratory-data-analysis-eda-a-step-by-step-guide/)
	""")