sks01dev commited on
Commit
aeee3af
Β·
verified Β·
1 Parent(s): 14b2c6d

Delete Week 3

Browse files
Files changed (2) hide show
  1. Week 3/Week_3.ipynb +0 -0
  2. Week 3/readme.md +0 -100
Week 3/Week_3.ipynb DELETED
The diff for this file is too large to render. See raw diff
 
Week 3/readme.md DELETED
@@ -1,100 +0,0 @@
1
- # Machine Learning Zoomcamp 2025 - Homework 3
2
-
3
- [![Python](https://img.shields.io/badge/Python-3.11-blue?logo=python&logoColor=white)](https://www.python.org/)
4
- [![Pandas](https://img.shields.io/badge/Pandas-1.5.3-orange?logo=pandas&logoColor=white)](https://pandas.pydata.org/)
5
- [![Scikit-Learn](https://img.shields.io/badge/Scikit--Learn-1.3.1-green?logo=scikit-learn&logoColor=white)](https://scikit-learn.org/stable/)
6
- [![Jupyter](https://img.shields.io/badge/Jupyter-Notebook-yellow?logo=jupyter&logoColor=white)](https://jupyter.org/)
7
-
8
- ---
9
-
10
- ## Homework 3: Machine Learning for Classification
11
-
12
- This repository contains solutions for **Homework 3** of **Machine Learning Zoomcamp 2025**, focused on **classification tasks** using the Bank Marketing dataset.
13
-
14
- ---
15
-
16
- ## πŸ“‚ Project Overview
17
-
18
- - **Dataset:** [Bank Marketing Dataset](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/course_lead_scoring.csv)
19
- - **Target variable:** `converted` (whether the client signed up)
20
- - **Objective:** Data preprocessing, exploratory analysis, feature selection, and training logistic regression models (regularized and unregularized).
21
-
22
- **Tech Stack:**
23
- - **Python 3.11** – core programming language
24
- - **Pandas** – data manipulation
25
- - **NumPy** – numerical operations
26
- - **Scikit-Learn** – machine learning models, feature selection, evaluation
27
- - **Jupyter Notebook** – interactive coding and documentation
28
-
29
- ---
30
-
31
- ## πŸ”Ή Questions & Answers
32
-
33
- | Question | Task | Answer |
34
- |----------|------|--------|
35
- | 1 | Mode of `industry` | `retail` |
36
- | 2 | Biggest correlation (numerical features) | `annual_income` and `interaction_count` |
37
- | 3 | Biggest mutual information (categorical features) | `lead_source` |
38
- | 4 | Logistic regression validation accuracy | 0.74 |
39
- | 5 | Least useful feature (feature elimination) | `lead_score` |
40
- | 6 | Best `C` value for regularized logistic regression | 1 |
41
-
42
- ---
43
-
44
- ## πŸ“Œ Approach / Key Steps
45
-
46
- 1. **Data Cleaning & Preparation**
47
- - Filled missing values: categorical β†’ `'NA'`, numerical β†’ `0.0`
48
- - Verified feature types and correlations
49
-
50
- 2. **Exploratory Analysis**
51
- - Mode of categorical variables
52
- - Correlation matrix for numerical features
53
-
54
- 3. **Feature Selection**
55
- - Calculated mutual information for categorical variables using `mutual_info_score`
56
- - Identified least useful features via feature elimination
57
-
58
- 4. **Model Training**
59
- - Logistic Regression with one-hot encoded categorical variables
60
- - Regularized logistic regression with hyperparameter tuning (`C` values)
61
-
62
- ---
63
-
64
- ## πŸ“ˆ Results
65
-
66
- - Baseline logistic regression accuracy: **0.74**
67
- - Least useful feature: **`lead_score`**
68
- - Best regularization parameter `C`: **1**
69
-
70
- ---
71
-
72
- ## βš™ How to Run
73
-
74
- 1. Clone the repository:
75
- ```bash
76
- git clone https://github.com/yourusername/ml-zoomcamp-hw3.git
77
- ```
78
-
79
- 2. Install requirements:
80
- ```bash
81
- pip install -r requirements.txt
82
- ```
83
-
84
- 3. Open the Jupyter Notebook and run cells sequentially:
85
- ```bash
86
- jupyter notebook
87
- ```
88
-
89
- ---
90
-
91
- ## πŸ“š References
92
-
93
- - [Bank Marketing Dataset](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/course_lead_scoring.csv)
94
- - [Scikit-Learn Documentation](https://scikit-learn.org/stable/)
95
- - [Pandas Documentation](https://pandas.pydata.org/)
96
- - [NumPy Documentation](https://numpy.org/)
97
- - [Jupyter Notebook Documentation](https://jupyter.org/)
98
-
99
- ---
100
-