Spaces:
Sleeping
Sleeping
BSc: Introduction To Machine Learning | |
===================================== | |
Contents | |
-------- | |
* [1 Introduction to Machine Learning](#Introduction_to_Machine_Learning) | |
+ [1.1 Short Description](#Short_Description) | |
+ [1.2 Prerequisites](#Prerequisites) | |
- [1.2.1 Prerequisite subjects](#Prerequisite_subjects) | |
- [1.2.2 Prerequisite topics](#Prerequisite_topics) | |
+ [1.3 Course Topics](#Course_Topics) | |
+ [1.4 Intended Learning Outcomes (ILOs)](#Intended_Learning_Outcomes_.28ILOs.29) | |
- [1.4.1 What is the main purpose of this course?](#What_is_the_main_purpose_of_this_course.3F) | |
- [1.4.2 ILOs defined at three levels](#ILOs_defined_at_three_levels) | |
* [1.4.2.1 Level 1: What concepts should a student know/remember/explain?](#Level_1:_What_concepts_should_a_student_know.2Fremember.2Fexplain.3F) | |
* [1.4.2.2 Level 2: What basic practical skills should a student be able to perform?](#Level_2:_What_basic_practical_skills_should_a_student_be_able_to_perform.3F) | |
* [1.4.2.3 Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?](#Level_3:_What_complex_comprehensive_skills_should_a_student_be_able_to_apply_in_real-life_scenarios.3F) | |
+ [1.5 Grading](#Grading) | |
- [1.5.1 Course grading range](#Course_grading_range) | |
- [1.5.2 Course activities and grading breakdown](#Course_activities_and_grading_breakdown) | |
- [1.5.3 Recommendations for students on how to succeed in the course](#Recommendations_for_students_on_how_to_succeed_in_the_course) | |
+ [1.6 Resources, literature and reference materials](#Resources.2C_literature_and_reference_materials) | |
- [1.6.1 Open access resources](#Open_access_resources) | |
- [1.6.2 Closed access resources](#Closed_access_resources) | |
- [1.6.3 Software and tools used within the course](#Software_and_tools_used_within_the_course) | |
* [2 Teaching Methodology: Methods, techniques, & activities](#Teaching_Methodology:_Methods.2C_techniques.2C_.26_activities) | |
+ [2.1 Activities and Teaching Methods](#Activities_and_Teaching_Methods) | |
+ [2.2 Formative Assessment and Course Activities](#Formative_Assessment_and_Course_Activities) | |
- [2.2.1 Ongoing performance assessment](#Ongoing_performance_assessment) | |
* [2.2.1.1 Section 1](#Section_1) | |
* [2.2.1.2 Section 2](#Section_2) | |
* [2.2.1.3 Section 3](#Section_3) | |
* [2.2.1.4 Section 4](#Section_4) | |
- [2.2.2 Final assessment](#Final_assessment) | |
- [2.2.3 The retake exam](#The_retake_exam) | |
Introduction to Machine Learning | |
================================ | |
* **Course name**: Introduction to Machine Learning | |
* **Code discipline**: R-01 | |
* **Subject area**: | |
Short Description | |
----------------- | |
This course covers the following concepts: Machine learning paradigms; Machine Learning approaches, and algorithms. | |
Prerequisites | |
------------- | |
### Prerequisite subjects | |
* CSE202 — Analytical Geometry and Linear Algebra I | |
* CSE204 — Analytical Geometry and Linear Algebra II | |
* CSE201 — Mathematical Analysis I | |
* CSE203 — Mathematical Analysis II | |
* CSE206 — Probability And Statistics | |
* CSE117 — Data Structures and Algorithms: python, numpy, basic object-oriented concepts, memory management. | |
### Prerequisite topics | |
Course Topics | |
------------- | |
Course Sections and Topics | |
| Section | Topics within the section | |
| | |
| --- | --- | | |
| Supervised Learning | 1. Introduction to Machine Learning | |
2. Derivatives and Cost Function | |
3. Data Pre-processing | |
4. Linear Regression | |
5. Multiple Linear Regression | |
6. Gradient Descent | |
7. Polynomial Regression | |
8. Bias-varaince Tradeoff | |
9. Difference between classification and regression | |
10. Logistic Regression | |
11. Naive Bayes | |
12. KNN | |
13. Confusion Metrics | |
14. Performance Metrics | |
15. Regularization | |
16. Hyperplane Based Classification | |
17. Perceptron Learning Algorithm | |
18. Max-Margin Classification | |
19. Support Vector Machines | |
20. Slack Variables | |
21. Lagrangian Support Vector Machines | |
22. Kernel Trick | |
| | |
| Decision Trees and Ensemble Methods | 1. Decision Trees | |
2. Bagging | |
3. Boosting | |
4. Random Forest | |
5. Adaboost | |
| | |
| Unsupervised Learning | 1. K-means Clustering | |
2. K-means++ | |
3. Hierarchical Clustering | |
4. DBSCAN | |
5. Mean-shift | |
| | |
| Deep Learning | 1. Artificial Neural Networks | |
2. Back-propagation | |
3. Convolutional Neural Networks | |
4. Autoencoder | |
5. Variatonal Autoencoder | |
6. Generative Adversairal Networks | |
| | |
Intended Learning Outcomes (ILOs) | |
--------------------------------- | |
### What is the main purpose of this course? | |
There is a growing business need of individuals skilled in artificial intelligence, data analytics, and machine learning. Therefore, the purpose of this course is to provide students with an intensive treatment of a cross-section of the key elements of machine learning, with an emphasis on implementing them in modern programming environments, and using them to solve real-world data science problems. | |
### ILOs defined at three levels | |
#### Level 1: What concepts should a student know/remember/explain? | |
By the end of the course, the students should be able to ... | |
* Different learning paradigms | |
* A wide variety of learning approaches and algorithms | |
* Various learning settings | |
* Performance metrics | |
* Popular machine learning software tools | |
#### Level 2: What basic practical skills should a student be able to perform? | |
By the end of the course, the students should be able to ... | |
* Difference between different learning paradigms | |
* Difference between classification and regression | |
* Concept of learning theory (bias/variance tradeoffs and large margins etc.) | |
* Kernel methods | |
* Regularization | |
* Ensemble Learning | |
* Neural or Deep Learning | |
#### Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? | |
By the end of the course, the students should be able to ... | |
* Classification approaches to solve supervised learning problems | |
* Clustering approaches to solve unsupervised learning problems | |
* Ensemble learning to improve a model’s performance | |
* Regularization to improve a model’s generalization | |
* Deep learning algorithms to solve real-world problems | |
Grading | |
------- | |
### Course grading range | |
| Grade | Range | Description of performance | |
| | |
| --- | --- | --- | | |
| A. Excellent | 90-100 | - | |
| | |
| B. Good | 75-89 | - | |
| | |
| C. Satisfactory | 60-74 | - | |
| | |
| D. Poor | 0-59 | - | |
| | |
### Course activities and grading breakdown | |
| Activity Type | Percentage of the overall course grade | |
| | |
| --- | --- | | |
| Labs/seminar classes | 0 | |
| | |
| Interim performance assessment | 40 | |
| | |
| Exams | 60 | |
| | |
### Recommendations for students on how to succeed in the course | |
Resources, literature and reference materials | |
--------------------------------------------- | |
### Open access resources | |
* T. Hastie, R. Tibshirani, D. Witten and G. James. An Introduction to Statistical Learning. Springer 2013. | |
* T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer 2011. | |
* Tom M Mitchel. Machine Learning, McGraw Hill | |
* Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer | |
### Closed access resources | |
### Software and tools used within the course | |
Teaching Methodology: Methods, techniques, & activities | |
======================================================= | |
Activities and Teaching Methods | |
------------------------------- | |
Activities within each section | |
| Learning Activities | Section 1 | Section 2 | Section 3 | Section 4 | |
| | |
| --- | --- | --- | --- | --- | | |
| Development of individual parts of software product code | 1 | 1 | 1 | 1 | |
| | |
| Homework and group projects | 1 | 1 | 1 | 1 | |
| | |
| Midterm evaluation | 1 | 1 | 1 | 1 | |
| | |
| Testing (written or computer based) | 1 | 1 | 1 | 1 | |
| | |
| Discussions | 1 | 1 | 1 | 1 | |
| | |
Formative Assessment and Course Activities | |
------------------------------------------ | |
### Ongoing performance assessment | |
#### Section 1 | |
| Activity Type | Content | Is Graded? | |
| | |
| --- | --- | --- | | |
| Question | Is it true that in simple linear regression | |
R | |
2 | |
{\displaystyle {\textstyle R^{2}}} | |
{\displaystyle {\textstyle R^{2}}} and the squared correlation between X and Y are identical? | 1 | |
| | |
| Question | What are the two assumptions that the Linear regression model makes about the Error Terms? | 1 | |
| | |
| Question | Fit a regression model to a given data problem, and support your choice of the model. | 1 | |
| | |
| Question | In a list of given tasks, choose which are regression and which are classification tasks. | 1 | |
| | |
| Question | In a given graphical model of binary random variables, how many parameters are needed to define the Conditional Probability Distributions for this Bayes Net? | 1 | |
| | |
| Question | Write the mathematical form of the minimization objective of Rosenblatt’s perceptron learning algorithm for a two-dimensional case. | 1 | |
| | |
| Question | What is perceptron learning algorithm? | 1 | |
| | |
| Question | Write the mathematical form of its minimization objective for a two-dimensional case. | 1 | |
| | |
| Question | What is a max-margin classifier? | 1 | |
| | |
| Question | Explain the role of slack variable in SVM. | 1 | |
| | |
| Question | How to implement various regression models to solve different regression problems? | 0 | |
| | |
| Question | Describe the difference between different types of regression models, their pros and cons, etc. | 0 | |
| | |
| Question | Implement various classification models to solve different classification problems. | 0 | |
| | |
| Question | Describe the difference between Logistic regression and naive bayes. | 0 | |
| | |
| Question | Implement perceptron learning algorithm, SVMs, and its variants to solve different classification problems. | 0 | |
| | |
| Question | Solve a given optimization problem using the Lagrange multiplier method. | 0 | |
| | |
#### Section 2 | |
| Activity Type | Content | Is Graded? | |
| | |
| --- | --- | --- | | |
| Question | What are pros and cons of decision trees over other classification models? | 1 | |
| | |
| Question | Explain how tree-pruning works. | 1 | |
| | |
| Question | What is the purpose of ensemble learning? | 1 | |
| | |
| Question | What is a bootstrap, and what is its role in Ensemble learning? | 1 | |
| | |
| Question | Explain the role of slack variable in SVM. | 1 | |
| | |
| Question | Implement different variants of decision trees to solve different classification problems. | 0 | |
| | |
| Question | Solve a given classification problem problem using an ensemble classifier. | 0 | |
| | |
| Question | Implement Adaboost for a given problem. | 0 | |
| | |
#### Section 3 | |
| Activity Type | Content | Is Graded? | |
| | |
| --- | --- | --- | | |
| Question | Which implicit or explicit objective function does K-means implement? | 1 | |
| | |
| Question | Explain the difference between k-means and k-means++. | 1 | |
| | |
| Question | Whaat is single-linkage and what are its pros and cons? | 1 | |
| | |
| Question | Explain how DBSCAN works. | 1 | |
| | |
| Question | Implement different clustering algorithms to solve to solve different clustering problems. | 0 | |
| | |
| Question | Implement Mean-shift for video tracking | 0 | |
| | |
#### Section 4 | |
| Activity Type | Content | Is Graded? | |
| | |
| --- | --- | --- | | |
| Question | What is a fully connected feed-forward ANN? | 1 | |
| | |
| Question | Explain different hyperparameters of CNNs. | 1 | |
| | |
| Question | Calculate KL-divergence between two probability distributions. | 1 | |
| | |
| Question | What is a generative model and how is it different from a discriminative model? | 1 | |
| | |
| Question | Implement different types of ANNs to solve to solve different classification problems. | 0 | |
| | |
| Question | Calculate KL-divergence between two probability distributions. | 0 | |
| | |
| Question | Implement different generative models for different problems. | 0 | |
| | |
### Final assessment | |
**Section 1** | |
1. What does it mean for the standard least squares coefficient estimates of linear regression to be scale equivariant? | |
2. Given a fitted regression model to a dataset, interpret its coefficients. | |
3. Explain which regression model would be a better fit to model the relationship between response and predictor in a given data. | |
4. If the number of training examples goes to infinity, how will it affect the bias and variance of a classification model? | |
5. Given a two dimensional classification problem, determine if by using Logistic regression and regularization, a linear boundary can be estimated or not. | |
6. Explain which classification model would be a better fit to for a given classification problem. | |
7. Consider the Leave-one-out-CV error of standard two-class SVM. Argue that under a given value of slack variable, a given mathematical statement is either correct or incorrect. | |
8. How does the choice of slack variable affect the bias-variance tradeoff in SVM? | |
9. Explain which Kernel would be a better fit to be used in SVM for a given data. | |
**Section 2** | |
1. When a decision tree is grown to full depth, how does it affect tree’s bias and variance, and its response to noisy data? | |
2. Argue if an ensemble model would be a better choice for a given classification problem or not. | |
3. Given a particular iteration of boosting and other important information, calculate the weights of the Adaboost classifier. | |
**Section 3** | |
1. K-Means does not explicitly use a fitness function. What are the characteristics of the solutions that K-Means finds? Which fitness function does it implicitly minimize? | |
2. Suppose we clustered a set of N data points using two different specified clustering algorithms. In both cases we obtained 5 clusters and in both cases the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in one method be assigned to the same cluster in the other method? | |
3. What are the characterics of noise points in DBSCAN? | |
**Section 4** | |
1. Explain what is ReLU, what are its different variants, and what are their pros and cons? | |
2. Calculate the number of parameters to be learned during training in a CNN, given all important information. | |
3. Explain how a VAE can be used as a generative model. | |
### The retake exam | |
**Section 1** | |
**Section 2** | |
**Section 3** | |
**Section 4** | |