intro to machine learning with python by andreas muller & sarah guido

Aug 14, 2024

*raw notes included*

One of my favorite ways to learn is through books, especially when I can pair them with actually building something.

This was one of the books that helped me better understand machine learning with Python: Introduction to Machine Learning with Python by Andreas Müller and Sarah Guido.Because I came from a non-technical background, I knew I couldn’t just read technical material straight through and expect it to stick. So I approached the book kind of like I did in school.Before each chapter, I would do a mini pre-assessment or “prime” myself with information. Basically asking: What do I already know? What do I think this topic means? What gaps do I need to fill?

After reading, I would create mind maps and visual notes to connect ideas back to previous chapters. That part mattered a lot to me because machine learning started making more sense once I could see the bigger picture instead of isolated concepts.

I also noticed I learn technical topics better when I break things into smaller pieces and create analogies for myself. If I couldn’t explain it simply, I probably didn’t understand it yet.

About the book:

Starting with the "Why"

One thing I liked about this book is that it starts with the why first, which is honestly my favorite question.

Why Python?

Why machine learning?

Why tools like Jupyter Notebook, NumPy, Matplotlib, and scikit-learn?

Instead of jumping straight into math and code, the book first explains what kinds of problems machine learning can actually solve.

One of the first examples is predicting iris species based on flower measurements. The goal is to classify flowers into one of three groups: setosa, versicolor, or virginica.

Since we already know the correct labels, this would be considered supervised learning.

At a very high level, this chapter helped me understand that machine learning is basically:

Define the problem → understand your data → choose a model → evaluate → adjust → repeat.

Very “rinse and repeat.”

The Importance of Training and Testing Data

Training data is what teaches the model patterns.

Testing data is basically the pop quiz.

It’s data the model hasn’t seen before, so you can check whether it actually learned something useful or if it just memorized answers. This was also where I started learning about things like overfitting and underfitting.

Which honestly felt very human.Sometimes your model memorizes too much and performs badly on new data. Sometimes it oversimplifies and misses patterns entirely. Then comes the rinse and repeat phase.

Adjust. Test again. Compare.

Supervised Learning

my attempt at mapping supervised learning in a way my brain understood it.

For me, supervised learning broke into two buckets:

Classification

You have an answer key (you know your labels and categories).

Examples:
Spam or not spam
Disease or no disease

Regression

When you're predicting a continuous number. Your model is answering: "What number do I think this should be?"

Examples:
Housing prices
Revenue forecasting

Algorithms

Then you have algorithms, which are basically different approaches for modeling your data.

Some of the algorithms I learned about:

Random Forest
Support Vector Machine (SVM)
K-Nearest Neighbors
Naive Bayes (Probabilities)
- Multinomial
- Gaussian (Normal Distribution)
- Bernoulli
Linear Models
- Logistic Regression (Classification)
- Linear Regression (Regression)
  - Ridge Regression
  - Lasso Regression

Evaluation

Evaluation is basically your scoreboard to compare models, and to check how far off are you from your prediction. (Which the book discuss more in Chapter 5: Model Evaluation & Metrics).

Breaking it down visually.

If you're like huh? Yeah, that was me too. If something felt confusing, I would usually draw it out. For example, Support Vector Machines finally made more sense once I visualized it like separating apples and oranges with the largest possible boundary in between. "Decision boundary" sounds less scary if you put it in a different context. Same thing with dimensionality reduction and NLP.

classification algorithm: support vector machine

breaking down linear regression

my notes on unsupervised learning

Unsupervised Learning

Unsupervised learning is the other type of machine learning, except this time… you don’t know the answer. There are no labels.No “yes/no.”No correct category sitting there waiting for you.

So what are you supposed to do when you don’t even know what you're predicting?

Well..start with what you know. Learn about the data.. Try to find patterns.Try to make sense of what’s there.

Sometimes it is grouping them by similarities..that's clustering! Think: “Do any of these naturally belong together?” This is basically letting the algorithm find natural groupings in the data. This might means grouping similar users or discovering patterns you never knew existed.

Sometimes there are too many variables and your brain (and the model) gets overwhelmed. How can we simplify this data while keeping the important information? That's dimensionality reduction.

The way I understood it is: How do I keep the important information without keeping all the noise? Which helps with: visualization, simplifying data, and spotting relationships!

clustering: kmeans, dbscan, hiearchical

Working with Text

working with text

This chapter breaks down how machine learning works with text, which honestly came in handy more than I expected. Especially now with LLMs, where understanding things like tokens matters a lot more. At first, I remember wondering… how does a machine even understand text? Computers don’t actually read language the way we do, so text has to be transformed into something numerical before a model can work with it.

This chapter walks through some of the foundations of working with text, things like bag of words, tokenization, stop words, tf-idf, stemming, and lemmatization.

The way I understood it was that before a model can do anything useful with text, it first has to break language down into smaller pieces and turn it into something measurable.

For example, tokenization is basically breaking text into smaller chunks (words or pieces of words), while bag of words focuses on word frequency. TF-IDF was interesting because it helps figure out which words are actually important instead of just common.

Looking back, I didn’t realize at the time how helpful this foundation would be. But now, especially with LLMs and embeddings everywhere, understanding how machines process language feels way less magical and a lot more understandable.