30 Days, 30 Machine Learning Projects

Tue, Sep 10, 2024
7-minute read

I have been reading a lot about Machine Learning and AI recently and finished a number of tutorials on Coursera, YouTube, and Google. I do understand the basics, but I find myself getting bored too quickly.

I can read tutorials and all, but they become boring after some time. I believe learning by doing is more fun, especially for an experienced Web Developer like me. So, I have challenged myself to do this challenge of completing 30 small projects in 30 days.

What Projects should I work on?

I want to learn gradually. I don’t want to pick a complex project at the start and risk getting stuck. So the plan is to increase the complexity gradually.

I decided to ask the same question from ChatGPT, and this is what I am planning to follow.

Week	Day	Project
1	1	Predict house prices using Simple Linear Regression
1	2	Classify Iris flowers into species using Logistic Regression
1	3	Recognize handwritten digits with k-Nearest Neighbors on MNIST
1	4	Diagnose breast cancer as malignant or benign using a Decision Tree
1	5	Filter spam from a collection of emails using Naive Bayes
1	6	Predict wine quality from physicochemical properties using SVM
1	7	Determine Credit Card defaults using a Random Forest Classifier
2	8	Detecting fake news with a PassiveAggressive Classifier and TfidfVectorizer
2	9	Forecasting weather with Simple Linear Regression on time series data
2	10	Recommender System using Collaborative Filtering on user-item ratings matrix
2	11	Anomaly detection in network traffic with Isolation Forest
2	12	Predicting airline passenger satisfaction with Gradient Boosting Machine (GBM)
2	13	Build a music genre classifier using audio features extraction
2	14	Cluster grocery store customers based on purchase history with K-Means
3	15	Predict house prices with XGBoost
3	16	Real-time face detection in a webcam feed using OpenCV
3	17	Predict diabetes onset using Decision Trees and Random Forests
3	18	Time Series Forecasting of stock prices with ARIMA model
3	19	Customer churn prediction with XGBoost
3	20	Create a topic model using Latent Dirichlet Allocation (LDA)
4	21	Deploy a machine learning model using FastAPI and Heroku for real-time predictions
4	22	Recommender System with Matrix Factorization
4	23	Fraud Detection in Financial Transactions using Logistic Regression and Random Forest
4	24	K-Means clustering to segment customers based on behavior
4	25	Sentiment Analysis of customer reviews using traditional NLP techniques
4	26	Time Series Forecasting of electricity consumption using LSTM (Deep Learning Intro)
4	27	Image Classification with a small CNN on CIFAR-10 dataset
4	28	Build a simple chatbot using traditional NLP techniques
5	29	Credit risk prediction with Logistic Regression and SVM
5	30	Capstone Project: Predicting loan approvals using ensemble learning (Random Forest, XGBoost)

The Logic Behind Choosing These Problems

It is structured to help build a solid foundation, gradually move towards advanced machine learning topics, and then introduce you to deep learning concepts in a way that feels more natural. Here’s why I selected these specific problems:

Week 1: Core Supervised Learning Concepts

Days 1-7: These are foundational tasks designed to help me understand the basic principles of regression and classification. I explore key algorithms (Linear Regression, Logistic Regression, k-NN, Decision Trees, Naive Bayes, SVM, Random Forest) through hands-on tasks:

Linear Regression: A simple start with predicting house prices, helping me understand the essence of regression.
Logistic Regression: Moving to binary classification with the Iris dataset, introducing me to probability-based classifications.
k-NN & MNIST: Here, I dive into distance-based learning, setting the stage for image classification tasks later on.
Decision Tree & Naive Bayes: Both methods offer different perspectives on handling structured classification tasks.
SVM for Wine Quality: A more abstract but powerful introduction to hyperplanes and margins in classification.
Random Forest: Brings in ensemble learning, introducing the idea of combining weak learners for stronger predictions.

Week 2: Applying Machine Learning to Real-World Problems

Days 8-14: Now that I’ve grasped the basics, I’ll apply machine learning to real-world datasets, which makes the learning more relevant:

Fake News Detection: A dive into NLP with the TfidfVectorizer to detect fake news—a highly practical application.
Weather Forecasting: Time-series forecasting deepens my understanding of regression and pattern recognition.
Recommender Systems: By exploring collaborative filtering, I’m learning how recommendation engines work in real-world applications.
Anomaly Detection: Isolation Forest introduces unsupervised learning, focusing on identifying anomalies in data.
Gradient Boosting (GBM): I take ensemble learning a step further with GBM to boost prediction accuracy.
Music Genre Classification: A fun shift to working with audio features, transitioning away from structured and text data.
Clustering: A practical use of clustering for segmenting customers, setting me up for more unsupervised tasks later on.

Week 3: Introducing More Advanced Techniques

Days 15-21: Time to dive into more complex algorithms and techniques that build on my previous learning:

XGBoost for House Prices: To see how XGBoost outshines GBM and Random Forest by better handling overfitting and providing more control.
Real-Time Face Detection: Using OpenCV introduces me to computer vision, but I’ll hold off on deep learning for now.
Diabetes Prediction: I revisit classification, applying decision trees and random forests for more practical, medical predictions.
Stock Price Forecasting with ARIMA: This prepares me for time-series forecasting with recurrent neural networks (RNNs) in the future.
Customer Churn Prediction: A crucial business task, I’ll learn how to predict customer retention using XGBoost.
Topic Modeling with LDA: I dive deeper into NLP by understanding how unsupervised learning can extract hidden patterns from text.
Model Deployment: I bring it all together by deploying my models with Flask and Heroku, understanding how to make them live and usable.

Week 4: Prepping for Deep Learning and Advanced Topics

Days 22-30: By now, I’m ready to either dive into deep learning or explore more advanced models. This week will help me transition to the next level:

Matrix Factorization for Recommendations: I’ll build a more advanced recommender system using matrix factorization.
Financial Fraud Detection: Applying ensemble learning to a real-world problem with significant business impact.
Customer Segmentation with K-Means: I deepen my clustering skills, working on a real-world marketing problem.
Sentiment Analysis with traditional NLP: I revisit NLP to solidify my text analysis techniques before diving into deep learning.
LSTM for Time Series Forecasting: Finally, I step into deep learning with LSTM for time series, opening up the world of RNNs.
Image Classification with CNN: My first attempt at building a Convolutional Neural Network (CNN) using CIFAR-10, a major milestone in deep learning for image data.
Building a Chatbot: This practical NLP problem helps me understand how businesses are utilizing machine learning for interactive tasks.
Prediction with Logistic Regression and SVM: An advanced take on blending two powerful classification models.
Capstone Project: ’ll wrap up with an advanced project, bringing multiple concepts together (like ensemble learning) to predict loan approvals, acting as a final showcase of everything I’ve learned.

Why This Structure Works:

Gradual Introduction of Complexity: I started the challenge with easier problems and models, but each week I’m adding layers of complexity. By week 3, I’m ready to take on more complex algorithms and tasks.

Unsupervised Learning: I’m introducing techniques like clustering in week 2, but I’ll return to them with more advanced datasets and models in weeks 3 and 4.
Deep Learning Gradual Introduction: Instead of diving straight into deep learning, I’m starting with traditional methods (week 3) and only moving into deep learning models like LSTM and CNN toward the end of the challenge.
Real-World Application: Every task I’m working on is designed to solve a real-world problem, whether it’s in finance, marketing, image recognition, or NLP, ensuring it’s always relevant.

With this structure, I’ll ease into more advanced topics like deep learning, so I won’t feel overwhelmed like before!

Plan

I am a morning person. I am planning to dedicate 2 hours in the early morning for the project and write up the progress post in the evening after my work hours.

The challenge starts on the 11th of September 2024 and ends on the 10th of October 2024.

Git repository for the code-base: [HERE](https://github.com/saxenaakansha30/30-days-ml-challenge)

I saw myself procrastinating a lot lately. So, posting my updates here will leave me accountable. Solving every day, I may not end up with the perfect solution for the projects, but the learning and iterative development process is the key to learning any new skill.

Come and join if you feel the same!! Otherwise too ;)