pranayreddy316 commited on
Commit
6f2ec69
·
verified ·
1 Parent(s): eac17f1

Upload The Linear_Regression_Algorithm.py

Browse files
pages/The Linear_Regression_Algorithm.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+
4
+ st.set_page_config(page_title="Regression Models Explorer", layout="wide")
5
+
6
+ st.title("📊 Regression Models - Linear Regression")
7
+
8
+ # --- Linear Regression Section ---
9
+ st.header("📈 Linear Regression - In Depth")
10
+
11
+ st.markdown("""
12
+ ## 📘 What is Linear Regression?
13
+ Linear Regression is a **supervised learning** algorithm used to predict **continuous numeric outcomes** based on one or more input features. It assumes a **linear relationship** between the independent variable(s) (X) and the dependent variable (y).
14
+
15
+ Linear Regression can be:
16
+ - **Simple Linear Regression**: One feature (X)
17
+ - **Multiple Linear Regression**: Multiple features (X1, X2, ..., Xn)
18
+
19
+ ---
20
+
21
+ ## 📐 Mathematical Formulation
22
+ The standard form of the linear model is:
23
+
24
+ \[ y = w_1x_1 + w_2x_2 + ... + w_nx_n + b = w^T x + b \]
25
+
26
+ Where:
27
+ - `x`: input feature vector (independent variables)
28
+ - `w`: weights (coefficients learned by the model)
29
+ - `b`: bias (intercept)
30
+ - `y`: predicted continuous output
31
+
32
+ The model parameters are estimated using **Ordinary Least Squares (OLS)** by minimizing the **Mean Squared Error (MSE)**:
33
+
34
+ \[ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]
35
+
36
+ ---
37
+
38
+ ## 🔍 Key Concepts and Assumptions
39
+
40
+ ### 1. **Least Squares Estimation**
41
+ The method used to minimize the squared differences between actual and predicted values.
42
+
43
+ ### 2. **Residuals**
44
+ The difference between actual values and predicted values \( e_i = y_i - \hat{y}_i \). Analyzing residuals helps identify model misspecification.
45
+
46
+ ### 3. **R-squared (R²)**
47
+ Represents the proportion of variance explained by the model:
48
+
49
+ \[ R^2 = 1 - \frac{\text{SSR}}{\text{SST}} \]
50
+ Where SSR = Sum of Squared Residuals, SST = Total Sum of Squares.
51
+
52
+ ### 4. **Key Assumptions**
53
+ - **Linearity**: The relationship between X and y is linear.
54
+ - **Independence**: Observations are independent of each other.
55
+ - **Homoscedasticity**: Constant variance of residuals.
56
+ - **Normality**: Residuals are normally distributed.
57
+ - **No Multicollinearity**: Features should not be highly correlated.
58
+
59
+ Violating these assumptions can reduce the reliability of predictions.
60
+
61
+ ---
62
+
63
+ ## 🔧 Common Hyperparameters (Scikit-learn)
64
+
65
+ | Parameter | Description |
66
+ |------------------|-----------------------------------------------------------------------------|
67
+ | `fit_intercept` | If True, model calculates the intercept. Set False when data is already centered. |
68
+ | `copy_X` | If True, X is copied. Set False for memory efficiency. |
69
+ | `n_jobs` | Number of cores used for computation. Useful for large datasets. |
70
+
71
+ **Note**: `normalize` has been deprecated. Use preprocessing (e.g., `StandardScaler`) in a pipeline instead.
72
+
73
+ ---
74
+
75
+ ## ✅ Advantages
76
+ - Simple and fast to implement.
77
+ - Works well when assumptions are met.
78
+ - Coefficients are interpretable.
79
+ - Requires less computational power.
80
+
81
+ ## ❌ Disadvantages
82
+ - Limited to linear relationships.
83
+ - Sensitive to outliers.
84
+ - Poor performance with irrelevant or highly correlated features.
85
+
86
+ ---
87
+
88
+ ## 🧪 Optuna for Hyperparameter Tuning (Conceptual)
89
+ Linear Regression has limited hyperparameters, but Optuna is helpful in more complex pipelines:
90
+
91
+ - **Polynomial Regression**: Tune the degree of the polynomial.
92
+ - **Ridge / Lasso / ElasticNet**: Tune regularization strength (`alpha`, `l1_ratio`).
93
+ - **Feature Selection**: Use Optuna to select best subset of features.
94
+
95
+ You define an **objective function** (e.g., minimize RMSE), and Optuna optimizes hyperparameters via **Bayesian optimization**.
96
+
97
+ ---
98
+
99
+ ## 📌 Use Cases
100
+ - Predicting house prices based on features like area, rooms, location
101
+ - Sales forecasting from historical data
102
+ - Medical cost estimation from patient info
103
+ - Predicting CO₂ emissions from engine parameters
104
+
105
+ 📎 **Tip**: Always visualize **residual plots** to verify assumptions and consider adding interaction or polynomial terms for capturing complexity.
106
+ """)