Rate this book

The StatQuest Illustrated Guide To Machine Learning

Name: The StatQuest Illustrated Guide To Machine Learning
Rating: 4.71 (31 reviews)

Josh Starmer

Rate this book

Machine Learning is awesome and powerful, but it can also appear incredibly complicated. That’s where The StatQuest Illustrated Guide to Machine Learning comes in. This book takes the machine learning algorithms, no matter how complicated, and breaks them down into small, bite-sized pieces that are easy to understand. Each concept is clearly illustrated to provide you, the reader, with an intuition about how the methods work that goes beyond the equations alone. The StatQuest Illustrated Guide does not dumb down the concepts. Instead, it builds you up so that you are smarter and have a deeper understanding of Machine Learning.

The StatQuest Illustrated Guide to Machine Learning starts with the basics, showing you what machine learning is and what are its goals, and builds on those, one picture at a time, until you have mastered the concepts behind self driving cars and facial recognition.

GenresArtificial IntelligenceTechnologyNonfictionComputer ScienceScienceMathematicsSoftware

304 pages, Kindle Edition

Published May 2, 2022

118 people are currently reading

833 people want to read

About the author

Josh Starmer

4 books16 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

147 (76%)

4 stars

39 (20%)

3 stars

4 (2%)

2 stars

3 (1%)

1 star

0 (0%)

Displaying 1 - 30 of 31 reviews

mausam

13 reviews

October 18, 2023

I really enjoyed reading this book & learning more about machine learning. This is definitely a book that I will return to and re-read so I can properly understand and learn everything. Fingers crossed I can work on a ML model at work 🤞🏻

Khan

163 reviews53 followers

December 13, 2023

I was surprised by this one. I purchased this book simply because I liked the author's youtube channel and I wanted to support him. I watched some of his videos years ago and I thought they were helpful. This book is geared towards beginners but more precisely people who want to do machine learning but are terrified at the math. This is not a mathy book, the author is trying to cast a wide audience and in doing so, there is a sacrifice in level of complexity, depth and rigor. However, there is a bit of nuance in some of these concepts that is told in a way that is easily digestible. For that reason, I thought this book was a fun read but only for beginners.

As for people who're interested in getting into machine learning, I would warn you to stay away from influencer types. They often are students who're still in school or have a few years of experience and confidently tell their audiences that you don't need a high degree of mathematical knowledge to succeed in the field, although there are outliers in the field where this advice lines up. Trying to do machine learning while avoiding math is like trying to become a marathon runner while avoiding running. You cannot separate the two from one another, the intuition behind the concepts are far more important than simply being mechanical and plugging in formulas.

Since data science has exploded, it has created a lot of statisticians who learn a few concepts and apply them based on plugging in formulas, without understanding the boundary of these formulas. There's a quote from Taleb I love "If your mathematics is mechanical rather than mystical, you will not go far". Trust me, this applies to me as well, plugging in formulas you don't fully understand is not going to get you the knowledge you need to succeed and I genuinely hate when I don't understand a concept but use it. This is important to think about when studying mathematics, more precisely machine learning. With that being said, this book was a fun read for me. If you're on the outside of the field looking in, I would recommend this book. Here are some of the topics:

- Naive Bayes
- Logistic Regression
- Support Vector Machines
- Neural Networks
- Backpropagation
- Linear regression

To fully understand these concepts, I would recommend Calc 1-3, a lot of people would probably disagree with me, however understanding the intuition behind what derivatives are, going from calculus 1 (single variable) to calculus 3 (Multi-variable) is really helpful in understanding how rates of change are calculated going from 1 dimension to many. Concepts like the chain rule, power rule, quotient rule and multiple types of integration are also vital. Going from one dimension to many. They will really help you understand how things work, would recommend having this knowledge to fully get the most out of this book but I am sure most would disagree with me. It just depends how deep you want your understanding to be?

4 stars for me.

Lucille Nguyen

413 reviews13 followers

August 23, 2022

Good review and explanation of key machine learning concepts.

An Te

386 reviews26 followers

November 23, 2022

This is a unique point. It’s selling point is that it introduces all of the key machine learning algorithms in an accessible and fun way. Without a doubt, this is not an easy task but Josh does it with ease and wit. It isn’t there to answer all ML questions and topics but it’s there to provide an accessible account for a rather scary topic (if you don’t think it’s challenging, then you just be a genius or something)!

I’ll be sure to read this book again very soon to consolidate these concepts further.

I’d be on the lookout for any more of Josh’s publications in future. 😊

Daeus

388 reviews3 followers

October 28, 2023

Excellent, illustrated step by step explanations of the math and design of common ML models with digestible, simple examples - including evaluation and common parameter tuning techniques. The author makes things so, so clear. I especially appreciated the decision tree visuals. The silly little convos and jokes make the subject matter much less intimidating.

Quotes
- "A probability distribution is a type of model that approximates a histogram with an infinite amount of data."
- Residuals are the differences between observed and predicted values of a model (observed - predicted), we square them to not cancel each other out (positive and negative) and get the sum of square residuals (ssr) to get the total differences squared. We can then take the average via the meam squares error (mse) to account for different amounts of data.
- "R^2 then gives us the a percentage of how much the predictions improved by using the model were interest in instead of just the mean."
- R^2 is equal to the ssr of the mean, minus ssr of the fitted line, divided by the ssr of the mean (ssr(mean)-ssr(fitted line))/ssr(mean). Ie the R^2 value tells us the percentage the residuals around the mean shrank when we used the fitted line. If equal, R^2 is 0 since it isn't better than just using the mean. When ssr of the fitted line is 0, it means the R^2 is 1 and the fitted line fits the data perfectly. Any 2 data points have an R^2 = 0, which is why small amounts of data can have a high R^2. Note: a linear regression minimizes the ssr.
- p-value of 0.05 = 'if there was no difference between groups, and if we did the exact same experiment a bunch of times, then only 5% of those experiments would result in the wrong decision.'
- 'a small p-value (alone) does not imply that the effect size is large.'
- "When there's no analytical solution, Gradient Descent can save the day! Gradient Descent is an iterative solution that incrementally steps towards an optimal solution and is used in a certain wide variety of situations. Gradient descent starts with an initial guess and then improves the guess, one step at a time, until it finds an optimal solution or reaches a maximum number of steps." ... " the Learning Rate prevents us from taking steps that are too big and skipping last the lowest point in the curve. Typically for Gradient Descent, the Learning Curve is determined automatically: it starts relatively large and gets smaller with every step taken." ... "Unfortunately, Gradient Descent does not always find the best parameter values.. it's possible that we might get stuck at the bottom... local minimum instead of finding out way to the bottom and the global minimum." .... "However, there are a few things we can do about it we can: 1) try again using different random numbers to initialized the parameters that we want to optimize.. [which] may avoid a local minimum, 2) fiddle around with step size... 3) Use Stochastic Gradient Descent because extra randomness helps avoid getting trapped in a local minimum." ... " one major consideration [to determine the size of a mini-batch for stochastic gradient descent] is how much high-speed memory [computer hardware] we have access to."
- "The term loss function and cost function refers to anything we want to optimize when we fit a model to data." Eg SSR or MSE for regression.
- "Likelihoods are the y-axis coordinates for specific points on the curve... whereas probabilities are the area under the curve between two points."
- "When we use linear regression, we fit a line to the data by minimizing the sum of the squared residuals. In contrast, logistics regression swaps out residuals for likelihoods (y-axis coordinates) and fits a squiggle thst represents the maximum likelihood." Since logistic regression is a classification algorithm, we can the probability of one category for likelihood, and 1- probability of the other category for likelihood. We multiplie these together to get the total likelihoods. This gets really small, so often we log it then add it up and its more useful (log-likelihood). "One of the limitations of logistic regression is it assumes that we can fit an s-shaped squiggle to the data. If that's not a valid assumption then we need a decision tree, or a support vector machine, or a neural network or some other method thst can handle more complicated relationships among the data."
- True positive rate = sensitivity = recall. "We have 3 names for the exact same thing: the percentage of actual positives that were correctly classified."
- "ROC stands for Receiver Operating Characteristic, and the name comes from the graphs drawn during World War II that summarized how well radar operators correctly identified airplane in radar signals." AUC (area under the curve) is the area under the ROC curve. "A precision recall graph simply replaces the false positive rate on the x-axis with precision and renames the y-axis recall since recall is the same thing as the true positive rate."
- "Regularization reduces how sensitive the model is to the training data [to prevent overfitting].... Regularization increases bias a little but, but in return, we get a big decrease in variance."
- "Ridge Regularization, also called Square or L2 Regularization" adds a ridge penalty that is a lambda x slope^2. So if lambda is 0, a linear regression is just the best fit line for the training data, "as we continue to increase lambda, the slope gets closer and closer to 0 and the y-axis intercept becomes the average [y-axis variable] in the training dataset." Basically ridge regression and a penalty on steep slopes, the higher the lambda the greater the penalty. "When we only have one slope to optimize one way to find the line that minimizes SSR + the ridge penalty is to use gradient descent."
- "Lasso regularization, also called absolute value or L1 regularization, replaces the square that we use in the ridge penalty with the absolute value."
- "The big difference between ridge and lasso regularization is that ridge regularization can only shrink the parameters to he asymptomatically close to 0. In contrast, lasso regularization can shrink parameters all the way to 0.".... "Thus, lasso regularization can exclude useless variables from the model.... in contrast, ridge regularization tends to perform better when most of the variables are useful."
- "One thing that's weird about decision trees is that they're upside down! The roots [root node] are on the top, and the leaves [leaf nodes] are on the bottom!"
- Classification trees: "Leaves that contain mixtures of classifications are called impure."..."[calculate] the Gini impurity to quantify the impurity in the leaves."..."Gini impurity = 1 - (probability of yes)^2 - (probability of no)^2"..."Total Gina impurity = weighted average of Gini impurities for the leaves."...."the lowest Gini impurity, we'll put it at the top of the tree.".... "Now, because everyone in this node does not love troll 2 [classifier], it becomes a leaf, because there's no point in sitting the people up into smaller groups."..."[if] the leaf would be impure... we would also have a better sense of the accuracy of our prediction"..."Even though this leaf is impute it still needs an output value....
We selected output values for each leaf by picking the categories with the highest counts." [Visuals helped here].
- Regression trees are helpful when a fitted line or s-curve (ie logistic regression) doesn't match the data (eg a quadratic function). I like the example of dosage of a drug, where too little and too much isn't effective. So we use thresholds that mimimize the ssr (sum of squared residuals), to get our root node threshold value.
- "...making a prediction based on a single measurement suggests thst the tree is overfit to the training data and may not perform well in the future. The simplest way to prevent this issue is to only split measurements when there are more than some minimum number, which is often 20."
- "Just like for classification trees, regression trees can use any type of variable to make a prediction. However, with regression trees, we always try to predict a continuous value."
- "I like how easy [decision trees] are to interpret and how you can build them from any type of data."
- "Although they sound super intimidating, all neural networks do is fit fancy squiggles or bent shapes to data [for classification]. And like decision trees and svms, neural network do fine with any relationship among variables."
- "the layers of notes between the input and output layers are called hidden layers. Part of the art of neural networks is deciding how many hidden layers to use and how many nodes should be in each one. Generally speaking, the more layers and nodes, the more complicated the shape that can be fit to the data."
- "Just like for linear regression, we can use gradient descent (or stochastic gradient descent) to find the optimal parameter values.... however, we don't call it gradient descent.... Instead, because of how the derivatives are found for each parameter in the neural network (from the back to the front), we call it backpropagation."
- "The neural network [for self driving cars] probably fits the training data really well, but there's no telling what its doing between the points, and that means it will be hard to predict what a self-driving car will do in new situations."
- "Neural networks are cool, but deciding how many hidden layers to use and how many nodes to put in each hidden layer and even picking the best activation function is a bit of an art form. In contrast, creating a model with logistic regression is a science, and there's no guesswork involved.... [neutral networks] might require a lot of tweaking before it performs well.... furthermore, when we use a lot of variables to make predictions, it can be much easier to interpret a logistic regression model than a neural network. In other words, it's easy to know how logistic regression makes prediction. In contrast, it's much more difficult to understand how a neural network makes predictions."
- "Probability was invented in order to figure out how to win games of chance in the 16th century."

data-science-career

Avik

33 reviews

March 14, 2025

Josh Starmer's illustrated guide to Machine Learning is a unique book that will be valuable for both beginners diving into the world of ML and veterans looking to refresh their concepts (or perhaps looking at things from a different perspective). In the author's own words, it's a book that is drawn rather than written! A book's version of a video, if you will, one that you can play at exactly the speed that suits you.
Most importantly, he succeeds in making learning fun, with an endearing approach involving dollops of bright levity.

Joe

137 reviews4 followers

January 30, 2024

This really isn't a book you read cover-to-cover, it's more of a reference to consult when you want to see and learn more about particular concepts related to machine learning, AI, statistics, linear algebra, algorithms, and associated topics. It's an amazing book. They say a picture is worth a thousand words and this book is a goldmine of knowledge. As a data scientist and AI/ML researcher, I readily consult this visual guide to help understand what's normally provided elsewhere in Python source code, Jupyter notebooks, data visualizations, formulas, and flow charts. Dr. Starmer is a gifted educator who teaches complex topics using excellent illustrations. Highly recommended.

Giulio Ciacchini

370 reviews12 followers

September 6, 2023

Outstanding textbook, that everybody interested in statistics and machine learning should read.
Explains difficult concepts step by step, super clear.
even though it is illustrated this takes nothing away from rigorous explanations: precise, concise and straight to the point.
it compasses basic statistical concepts (cross validation, normal and binomial distributions, probability, MSE, p-value) up to the most famous machine learning algorithms.
The author takes nothing for granted, for instance the classification tree is explained starting from how to choose the root.

coding non-fiction

Archisman

27 reviews

June 26, 2023

Easy and concise to understand and refer back fast

Quinn

70 reviews33 followers

May 14, 2024

This book achieves its goal spectacularly. Concepts that had been introduced to me in graduate school courses from a heavily math-oriented perspective always remained rather abstract in my thoughts. As I began working on machine learning projects more recently, I needed a rapid refresher on concepts I now hadn't thought about in almost two years. This book not only brushed up my knowledge, but gave me a clearer understanding of the topics at a level where I can really talk about them both with people who are very knowledgeable in ML and those who are not. (The latter group is perhaps the most meaningful, in my field it is often easier to discuss ML with other people who are doing ML, but it takes a genuine conceptual understanding to explain something to someone new!)

I saw in another review someone mentioned that this book does not get into the math that ML is based on, and though that is somewhat true it lays out the initial groundwork in a beautiful way. Many intro to machine learning websites expect people to have a working vocabulary in machine learning, or to be able to pick up a concept after one sentence of explanation. This book is the vital precursor that ML-spaces have needed for a while. Though the much of the math related to ML and many advanced concepts are not covered here, many of the existing resources that cover them would be much more accessible to a beginner once they have read this book. I would highly recommend this book to anyone trying to learn about machine learning for the first time, anyone who may be trying to refresh their understanding of ML topics they haven't used in awhile, or instructors who are planning on teaching an intro level course.

Mike

20 reviews14 followers

July 7, 2024

A legitimately incredible book that explains the most important main ideas of machine learning in an approachable way. Someone said once that if you can’t explain it simply, you don’t really understand it. From linear regression to neural networks, Josh Starmer is the king of simplifying complex issues with easy to understand visual explanations.

This book does a better job explaining the main elements of machine learning than most of my graduate level data science professors and required coursework reading materials.

Data scientists sometimes have a reputation for not being able to explain their findings, what their findings mean, how their findings apply to businesses/orgs, or how they achieved them. Reading this book will not only help you, but also help you explain important elements of how ML works to your org/stakeholders/etc.

Whether you are a beginner or in the middle of learning python or R to build ML models and want to really understand machine learning, get this book.

Leo Vaulin

13 reviews1 follower

December 17, 2022

Did StatSquatch eat the Table of Contents?

I love watching the StatQuest videos, and this book is a great way to remind yourself of the key concepts without rewatching. My only complaint- the ToC simply says “Page 1”, “Page 2”,… even though there are clearly-marked chapters, as well as a generous appendix with even more Key Concepts.

So what’s up with that?

HINT: I paged through the book and stuck a bookmark at the start of every chapter. StatSquatch won’t win! TRIPLE BAM!!!

Dian Dimitrov

6 reviews

March 10, 2023

The author's clear visual style provides a comprehensive look at the basis of machine learning. This book is a good starting point for machine learning beginners, as it contains lots of concrete, easy-to-follow examples with corresponding explanation videos. It is an approachable, practical, and broad introduction to machine learning concepts, and the most beautifully illustrated machine learning book on the market I have ever found (prove me wrong).

Jeremy Morrison

2 reviews

December 29, 2024

Good for developing intuition around ML algos. Not enough by itself, it's lacking in mathematical rigor, but that's really not the point of the book.

I used it to refresh my memory on classic ML algos that I learned in school and for that it served its purpose perfectly.

Olivier Chabot

47 reviews11 followers

June 25, 2022

Great intro to big ideas. Far from being a how to guide for serious programmers. I've learned how to communicate educational content.

Lizhiyi

1 review

December 2, 2022

Good introduction book with nice pictures and clear explanation. It would be better if more algorithms are covered.

Sabah Shams

6 reviews1 follower

July 7, 2023

Thoroughly enjoyed the book, just found the same examples of "Loves troll" and "Loves troll 2" to be repetitive and unrealistic.

69 reviews

February 29, 2024

Exceptionally visually depicted, and pretty concise. So glad my data science mentor recommended this as a starting point.

professional

David Scott

23 reviews

June 24, 2024

A beautiful way to learn more about technical concepts mainly through well orchestrated visuals.

Simon

7 reviews

August 16, 2024

Fun short book, that gives a great overview of common machine learning topics.

Ben Wooding

40 reviews

January 16, 2025

Easy reading, perfect for beginners or those looking for a refresh. Perhaps too simplistic with those understanding some machine learning already and wanting to deepen it.

Maria Alvarez Olmos

51 reviews4 followers

January 31, 2025

Absolutely recommend it. ML so accessible and fun!

Palaniappan RM

14 reviews1 follower

March 1, 2025

I struggled to start with ML with various courses and YouTube videos. But I think this made me to understand the basic algorithms and have a kickstarter to this world. Excellent beginner book.

Joaquin Menendez

2 reviews

March 3, 2025

A piece of art blended with the most intuitive lessons. Reccommended for any undergrad or ML newbie.

Xevi

3 reviews

April 6, 2025

Positively surprised, highly enjoyable read - would definitely recommend it to everyone interested in this field

4 reviews

What a guy.

11 reviews

Greatly presented basics for those who are new to the field and need to learn the basic concepts, build basic intuition and not feel inferior and overwhelmed. The author published the book after many years on his YouTube channel, where he regularly posts the same high quality content.

It's short, to the point, funny and really well illustrated.

I wonder how many people in total this person has helped. :))) He helped me for sure.

Of course, it's not aimed at those who have been encountering these concepts since kindergarten and are thinking of possible next neural network constructions while waiting for the tram... ;D

educative machine-learning

Kajal Nellira

1 review3 followers

June 1, 2024

One of the best books to understand statistics for machine learning

Deepak Kandalam

25 reviews

May 12, 2024

Great book for ML beginners, author has a supporting YouTube channel where he elects and explain clearly all the terms and mathematical notions.

Alessandro

16 reviews26 followers

May 27, 2023

Accessible and entertaining introduction to most common machine learning techniques. I really enjoyed the way the author guided me through the book by using images, repeating, and summarising the key aspects of each technique.

Displaying 1 - 30 of 31 reviews

More reviews and ratings