End to End: Heart Disease Prediction

Aug 24, 2024

A few years ago, before I had GitHub and before I had any real system for organizing my work, I was learning machine learning by building projects in Jupyter notebooks. Python was the first language I learned, and at the time AI and ML were everywhere, so it felt like a good transition and something I naturally became curious about.

I wanted to understand what was actually happening under the hood. What even is machine learning? What is AI?

I learned that before the rise of generative AI, machine learning was already everywhere. Things like email spam detection, housing price predictions, fraud detection, and recommendation systems were already using it behind the scenes. As I learned more, I started realizing how much AI had already been woven into everyday life long before ChatGPT and image generators became popular.

Alongside this project, I was also reading Introduction to Machine Learning with Python by Andreas Muller and Sarah Guido, which helped me understand more of the technical side of things.

The project was pretty simple: given clinical data about a patient such as age, cholesterol, chest pain type, resting blood pressure, and about 10 other attributes, can a model predict whether they have heart disease? Just a yes or no answer.

It's a supervised learning problem, which means the model learns from labeled examples. I thought of it like studying with an answer key. You show it hundreds of patient records where you already know the outcome, it finds patterns, and then you test whether it can apply those patterns to patients it's never seen before. That same basic idea shows up in spam filters, fraud detection, and movie recommendations.

I tested three different algorithms: Logistic Regression, K-Nearest Neighbors, and Random Forest. Out of the box, Logistic Regression performed best at around 88.5% accuracy, KNN struggled at about 69%, and Random Forest landed somewhere in the middle.

But raw accuracy from the first run wasn't the most interesting part to me. What I found more interesting was learning what comes after: hyperparameter tuning.

Tuning is basically adjusting settings on your model to squeeze out better performance. If KNN asks, “how many nearby neighbors should I look at before making a prediction?” tuning helps find the answer that works best for your specific dataset.

After tuning, the final model landed at roughly 84.5% accuracy with a recall of 92%, which matters a lot in a medical context. You'd rather incorrectly flag someone than miss someone who actually has a disease.

Things I learned: how to do exploratory data analysis before jumping into modeling, how and why to split training and test data, how feature importance works, how to read a confusion matrix, and how much libraries like scikit-learn and matplotlib do once you understand what you're actually asking them to do.

I also hit a few errors along the way that I never fully fixed. There's a broken ROC curve cell and a mismatched classification report throwing a ValueError somewhere. I left them in because it felt more honest than pretending the notebook was perfect.

Looking back, this project did exactly what I wanted it to do. I understand more now about what it actually means to train a model, why fine-tuning matters, and even high-level ideas behind things like RAG. That foundation came from sitting down and building something like this before I really knew what I was doing.

That part I don't regret at all.

correlation heatmap: how strongly each feature relates to the others. the bottom row shows which factors were most linked to heart disease.

feature importance: which features influenced the model's predictions most, and in which direction.

‹ More Oil Pastels Studies

SF Chinatown - Loose Watercolor Study ›