Spaces:

pranayreddy316
/

Zero_to_Hero_in_Machine_Learning

Build error

App Files Files Community

pranayreddy316 commited on Apr 7

Commit

6f2ec69

verified ·

1 Parent(s): eac17f1

Upload The Linear_Regression_Algorithm.py

Browse files

Files changed (1) hide show

pages/The Linear_Regression_Algorithm.py +106 -0

pages/The Linear_Regression_Algorithm.py ADDED Viewed

	@@ -0,0 +1,106 @@

+import streamlit as st
+import pandas as pd
+st.set_page_config(page_title="Regression Models Explorer", layout="wide")
+st.title("📊 Regression Models - Linear Regression")
+# --- Linear Regression Section ---
+st.header("📈 Linear Regression - In Depth")
+st.markdown("""
+## 📘 What is Linear Regression?
+Linear Regression is a **supervised learning** algorithm used to predict **continuous numeric outcomes** based on one or more input features. It assumes a **linear relationship** between the independent variable(s) (X) and the dependent variable (y).
+Linear Regression can be:
+- **Simple Linear Regression**: One feature (X)
+- **Multiple Linear Regression**: Multiple features (X1, X2, ..., Xn)
+---
+## 📐 Mathematical Formulation
+The standard form of the linear model is:
+\[ y = w_1x_1 + w_2x_2 + ... + w_nx_n + b = w^T x + b \]
+Where:
+- `x`: input feature vector (independent variables)
+- `w`: weights (coefficients learned by the model)
+- `b`: bias (intercept)
+- `y`: predicted continuous output
+The model parameters are estimated using **Ordinary Least Squares (OLS)** by minimizing the **Mean Squared Error (MSE)**:
+\[ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]
+---
+## 🔍 Key Concepts and Assumptions
+### 1. **Least Squares Estimation**
+The method used to minimize the squared differences between actual and predicted values.
+### 2. **Residuals**
+The difference between actual values and predicted values \( e_i = y_i - \hat{y}_i \). Analyzing residuals helps identify model misspecification.
+### 3. **R-squared (R²)**
+Represents the proportion of variance explained by the model:
+\[ R^2 = 1 - \frac{\text{SSR}}{\text{SST}} \]
+Where SSR = Sum of Squared Residuals, SST = Total Sum of Squares.
+### 4. **Key Assumptions**
+- **Linearity**: The relationship between X and y is linear.
+- **Independence**: Observations are independent of each other.
+- **Homoscedasticity**: Constant variance of residuals.
+- **Normality**: Residuals are normally distributed.
+- **No Multicollinearity**: Features should not be highly correlated.
+Violating these assumptions can reduce the reliability of predictions.
+---
+## 🔧 Common Hyperparameters (Scikit-learn)
+| Parameter         | Description                                                                 |
+|------------------|-----------------------------------------------------------------------------|
+| `fit_intercept`  | If True, model calculates the intercept. Set False when data is already centered. |
+| `copy_X`         | If True, X is copied. Set False for memory efficiency.                     |
+| `n_jobs`         | Number of cores used for computation. Useful for large datasets.          |
+**Note**: `normalize` has been deprecated. Use preprocessing (e.g., `StandardScaler`) in a pipeline instead.
+---
+## ✅ Advantages
+- Simple and fast to implement.
+- Works well when assumptions are met.
+- Coefficients are interpretable.
+- Requires less computational power.
+## ❌ Disadvantages
+- Limited to linear relationships.
+- Sensitive to outliers.
+- Poor performance with irrelevant or highly correlated features.
+---
+## 🧪 Optuna for Hyperparameter Tuning (Conceptual)
+Linear Regression has limited hyperparameters, but Optuna is helpful in more complex pipelines:
+- **Polynomial Regression**: Tune the degree of the polynomial.
+- **Ridge / Lasso / ElasticNet**: Tune regularization strength (`alpha`, `l1_ratio`).
+- **Feature Selection**: Use Optuna to select best subset of features.
+You define an **objective function** (e.g., minimize RMSE), and Optuna optimizes hyperparameters via **Bayesian optimization**.
+---
+## 📌 Use Cases
+- Predicting house prices based on features like area, rooms, location
+- Sales forecasting from historical data
+- Medical cost estimation from patient info
+- Predicting CO₂ emissions from engine parameters
+📎 **Tip**: Always visualize **residual plots** to verify assumptions and consider adding interaction or polynomial terms for capturing complexity.
+""")