Jump to ratings and reviews
Rate this book

The StatQuest Illustrated Guide To Machine Learning

Rate this book
Machine Learning is awesome and powerful, but it can also appear incredibly complicated. That’s where The StatQuest Illustrated Guide to Machine Learning comes in. This book takes the machine learning algorithms, no matter how complicated, and breaks them down into small, bite-sized pieces that are easy to understand. Each concept is clearly illustrated to provide you, the reader, with an intuition about how the methods work that goes beyond the equations alone. The StatQuest Illustrated Guide does not dumb down the concepts. Instead, it builds you up so that you are smarter and have a deeper understanding of Machine Learning.

The StatQuest Illustrated Guide to Machine Learning starts with the basics, showing you what machine learning is and what are its goals, and builds on those, one picture at a time, until you have mastered the concepts behind self driving cars and facial recognition.

304 pages, Kindle Edition

Published May 2, 2022

118 people are currently reading
833 people want to read

About the author

Josh Starmer

4 books16 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
147 (76%)
4 stars
39 (20%)
3 stars
4 (2%)
2 stars
3 (1%)
1 star
0 (0%)
Displaying 1 - 30 of 31 reviews
Profile Image for mausam.
13 reviews
October 18, 2023
I really enjoyed reading this book & learning more about machine learning. This is definitely a book that I will return to and re-read so I can properly understand and learn everything. Fingers crossed I can work on a ML model at work 🤞🏻
Profile Image for Khan.
163 reviews53 followers
December 13, 2023
I was surprised by this one. I purchased this book simply because I liked the author's youtube channel and I wanted to support him. I watched some of his videos years ago and I thought they were helpful. This book is geared towards beginners but more precisely people who want to do machine learning but are terrified at the math. This is not a mathy book, the author is trying to cast a wide audience and in doing so, there is a sacrifice in level of complexity, depth and rigor. However, there is a bit of nuance in some of these concepts that is told in a way that is easily digestible. For that reason, I thought this book was a fun read but only for beginners.

As for people who're interested in getting into machine learning, I would warn you to stay away from influencer types. They often are students who're still in school or have a few years of experience and confidently tell their audiences that you don't need a high degree of mathematical knowledge to succeed in the field, although there are outliers in the field where this advice lines up. Trying to do machine learning while avoiding math is like trying to become a marathon runner while avoiding running. You cannot separate the two from one another, the intuition behind the concepts are far more important than simply being mechanical and plugging in formulas.

Since data science has exploded, it has created a lot of statisticians who learn a few concepts and apply them based on plugging in formulas, without understanding the boundary of these formulas. There's a quote from Taleb I love "If your mathematics is mechanical rather than mystical, you will not go far". Trust me, this applies to me as well, plugging in formulas you don't fully understand is not going to get you the knowledge you need to succeed and I genuinely hate when I don't understand a concept but use it. This is important to think about when studying mathematics, more precisely machine learning. With that being said, this book was a fun read for me. If you're on the outside of the field looking in, I would recommend this book. Here are some of the topics:

- Naive Bayes
- Logistic Regression
- Support Vector Machines
- Neural Networks
- Backpropagation
- Linear regression

To fully understand these concepts, I would recommend Calc 1-3, a lot of people would probably disagree with me, however understanding the intuition behind what derivatives are, going from calculus 1 (single variable) to calculus 3 (Multi-variable) is really helpful in understanding how rates of change are calculated going from 1 dimension to many. Concepts like the chain rule, power rule, quotient rule and multiple types of integration are also vital. Going from one dimension to many. They will really help you understand how things work, would recommend having this knowledge to fully get the most out of this book but I am sure most would disagree with me. It just depends how deep you want your understanding to be?

4 stars for me.
Profile Image for An Te.
386 reviews26 followers
November 23, 2022
This is a unique point. It’s selling point is that it introduces all of the key machine learning algorithms in an accessible and fun way. Without a doubt, this is not an easy task but Josh does it with ease and wit. It isn’t there to answer all ML questions and topics but it’s there to provide an accessible account for a rather scary topic (if you don’t think it’s challenging, then you just be a genius or something)!

I’ll be sure to read this book again very soon to consolidate these concepts further.

I’d be on the lookout for any more of Josh’s publications in future. 😊
Profile Image for Daeus.
388 reviews3 followers
October 28, 2023
Excellent, illustrated step by step explanations of the math and design of common ML models with digestible, simple examples - including evaluation and common parameter tuning techniques. The author makes things so, so clear. I especially appreciated the decision tree visuals. The silly little convos and jokes make the subject matter much less intimidating.

Quotes
- "A probability distribution is a type of model that approximates a histogram with an infinite amount of data."
- Residuals are the differences between observed and predicted values of a model (observed - predicted), we square them to not cancel each other out (positive and negative) and get the sum of square residuals (ssr) to get the total differences squared. We can then take the average via the meam squares error (mse) to account for different amounts of data.
- "R^2 then gives us the a percentage of how much the predictions improved by using the model were interest in instead of just the mean."
- R^2 is equal to the ssr of the mean, minus ssr of the fitted line, divided by the ssr of the mean (ssr(mean)-ssr(fitted line))/ssr(mean). Ie the R^2 value tells us the percentage the residuals around the mean shrank when we used the fitted line. If equal, R^2 is 0 since it isn't better than just using the mean. When ssr of the fitted line is 0, it means the R^2 is 1 and the fitted line fits the data perfectly. Any 2 data points have an R^2 = 0, which is why small amounts of data can have a high R^2. Note: a linear regression minimizes the ssr.
- p-value of 0.05 = 'if there was no difference between groups, and if we did the exact same experiment a bunch of times, then only 5% of those experiments would result in the wrong decision.'
- 'a small p-value (alone) does not imply that the effect size is large.'
- "When there's no analytical solution, Gradient Descent can save the day! Gradient Descent is an iterative solution that incrementally steps towards an optimal solution and is used in a certain wide variety of situations. Gradient descent starts with an initial guess and then improves the guess, one step at a time, until it finds an optimal solution or reaches a maximum number of steps." ... " the Learning Rate prevents us from taking steps that are too big and skipping last the lowest point in the curve. Typically  for Gradient Descent, the Learning Curve is determined automatically: it starts relatively large and gets smaller with every step taken." ... "Unfortunately, Gradient Descent does not always find the best parameter values.. it's possible that we might get stuck at the bottom... local minimum instead of finding out way to the bottom and the global minimum." .... "However, there are a few things we can do about it  we can: 1) try again using different random numbers to initialized the parameters that we want to optimize.. [which] may avoid a local minimum, 2) fiddle around with step size... 3) Use Stochastic Gradient Descent because extra randomness helps avoid getting trapped in a local minimum." ...  " one major consideration [to determine the size of a mini-batch for stochastic gradient descent] is how much high-speed memory [computer hardware] we have access to."
- "The term loss function and cost function refers to anything we want to optimize when we fit a model to data." Eg SSR or MSE for regression.
- "Likelihoods are the y-axis coordinates for specific points on the curve... whereas probabilities are the area under the curve between two points."
- "When we use linear regression, we fit a line to the data by minimizing the sum of the squared residuals. In contrast, logistics regression swaps out residuals for likelihoods (y-axis coordinates) and fits a squiggle thst represents the maximum likelihood." Since logistic regression is a classification algorithm, we can the probability of one category for likelihood, and 1- probability of the other category for likelihood. We multiplie these together to get the total likelihoods. This gets really small, so often we log it then add it up and its more useful (log-likelihood). "One of the limitations of logistic regression is it assumes that we can fit an s-shaped squiggle to the data. If that's not a valid assumption  then we need a decision tree, or a support vector machine, or a neural network or some other method thst can handle more complicated relationships among the data."
- True positive rate = sensitivity = recall. "We have 3 names for the exact same thing: the percentage of actual positives that were correctly classified."
- "ROC stands for Receiver Operating Characteristic, and the name comes from the graphs drawn during World War II that summarized how well radar operators correctly identified airplane in radar signals." AUC (area under the curve) is the area under the ROC curve. "A precision recall graph simply replaces the false positive rate on the x-axis with precision and renames the y-axis recall since recall is the same thing as the true positive rate."
- "Regularization reduces how sensitive the model is to the training data [to prevent overfitting].... Regularization increases bias a little but, but in return, we get a big decrease in variance."
- "Ridge Regularization, also called Square or L2 Regularization" adds a ridge penalty that is a lambda x slope^2. So if lambda is 0, a linear regression is just the best fit line for the training data, "as we continue to increase lambda, the slope gets closer and closer to 0 and the y-axis intercept becomes the average [y-axis variable] in the training dataset." Basically ridge regression and a penalty on steep slopes, the higher the lambda the greater the penalty. "When we only have one slope to optimize  one way to find the line that minimizes SSR + the ridge penalty is to use gradient descent."
- "Lasso regularization, also called absolute value or L1 regularization, replaces the square that we use in the ridge penalty with the absolute value."
- "The big difference between ridge and lasso regularization is that ridge regularization can only shrink the parameters to he asymptomatically close to 0. In contrast, lasso regularization can shrink parameters all the way to 0.".... "Thus, lasso regularization can exclude useless variables from the model.... in contrast, ridge regularization tends to perform better when most of the variables are useful."
- "One thing that's weird about decision trees is that they're upside down! The roots [root node] are on the top, and the leaves [leaf nodes] are on the bottom!"
- Classification trees: "Leaves that contain mixtures of classifications are called impure."..."[calculate] the Gini impurity to quantify the impurity in the leaves."..."Gini impurity = 1 - (probability of yes)^2 - (probability of no)^2"..."Total Gina impurity = weighted average of Gini impurities for the leaves."...."the lowest Gini impurity, we'll put it at the top of the tree.".... "Now, because everyone in this node does not love troll 2 [classifier], it becomes a leaf, because there's no point in sitting the people up into smaller groups."..."[if] the leaf would be impure... we would also have a better sense of the accuracy of our prediction"..."Even though this leaf is impute it still needs an output value....
We selected output values for each leaf by picking the categories with the highest counts." [Visuals helped here].
- Regression trees are helpful when a fitted line or s-curve (ie logistic regression) doesn't match the data (eg a quadratic function). I like the example of dosage of a drug, where too little and too much isn't effective. So we use thresholds that mimimize the ssr (sum of squared residuals), to get our root node threshold value.
- "...making a prediction based on a single measurement suggests thst the tree is overfit to the training data and may not perform well in the future. The simplest way to prevent this issue is to only split measurements when there are more than some minimum number, which is often 20."
- "Just like for classification trees, regression trees can use any type of variable to make a prediction. However, with regression trees, we always try to predict a continuous value."
- "I like how easy [decision trees] are to interpret and how you can build them from any type of data."
- "Although they sound super intimidating, all neural networks do is fit fancy squiggles or bent shapes to data [for classification]. And like decision trees and svms, neural network do fine with any relationship among variables."
- "the layers of notes between the input and output layers are called hidden layers. Part of the art of neural networks is deciding how many hidden layers to use and how many nodes should be in each one. Generally speaking, the more layers and nodes, the more complicated the shape that can be fit to the data."
- "Just like for linear regression, we can use gradient descent (or stochastic gradient descent) to find the optimal parameter values.... however, we don't call it gradient descent.... Instead, because of how the derivatives are found for each parameter in the neural network (from the back to the front), we call it backpropagation."
- "The neural network [for self driving cars] probably fits the training data really well, but there's no telling what its doing between the points, and that means it will be hard to predict what a self-driving car will do in new situations."
- "Neural networks are cool, but deciding how many hidden layers to use and how many nodes to put in each hidden layer and even picking the best activation function is a bit of an art form. In contrast, creating a model with logistic regression is a science, and there's no guesswork involved.... [neutral networks] might require a lot of tweaking before it performs well.... furthermore, when we use a lot of variables to make predictions, it can be much easier to interpret a logistic regression model than a neural network. In other words, it's easy to know how logistic regression makes prediction. In contrast, it's much more difficult to understand how a neural network makes predictions."
- "Probability was invented in order to figure out how to win games of chance in the 16th century."
Profile Image for Avik.
33 reviews
March 14, 2025
Josh Starmer's illustrated guide to Machine Learning is a unique book that will be valuable for both beginners diving into the world of ML and veterans looking to refresh their concepts (or perhaps looking at things from a different perspective). In the author's own words, it's a book that is drawn rather than written! A book's version of a video, if you will, one that you can play at exactly the speed that suits you.
Most importantly, he succeeds in making learning fun, with an endearing approach involving dollops of bright levity.
Profile Image for Joe.
137 reviews4 followers
January 30, 2024
This really isn't a book you read cover-to-cover, it's more of a reference to consult when you want to see and learn more about particular concepts related to machine learning, AI, statistics, linear algebra, algorithms, and associated topics. It's an amazing book. They say a picture is worth a thousand words and this book is a goldmine of knowledge. As a data scientist and AI/ML researcher, I readily consult this visual guide to help understand what's normally provided elsewhere in Python source code, Jupyter notebooks, data visualizations, formulas, and flow charts. Dr. Starmer is a gifted educator who teaches complex topics using excellent illustrations. Highly recommended.
Profile Image for Giulio Ciacchini.
370 reviews12 followers
September 6, 2023
Outstanding textbook, that everybody interested in statistics and machine learning should read.
Explains difficult concepts step by step, super clear.
even though it is illustrated this takes nothing away from rigorous explanations: precise, concise and straight to the point.
it compasses basic statistical concepts (cross validation, normal and binomial distributions, probability, MSE, p-value) up to the most famous machine learning algorithms.
The author takes nothing for granted, for instance the classification tree is explained starting from how to choose the root.
Profile Image for Archisman.
27 reviews
June 26, 2023
Easy and concise to understand and refer back fast
Profile Image for Quinn.
70 reviews33 followers
May 14, 2024
This book achieves its goal spectacularly. Concepts that had been introduced to me in graduate school courses from a heavily math-oriented perspective always remained rather abstract in my thoughts. As I began working on machine learning projects more recently, I needed a rapid refresher on concepts I now hadn't thought about in almost two years. This book not only brushed up my knowledge, but gave me a clearer understanding of the topics at a level where I can really talk about them both with people who are very knowledgeable in ML and those who are not. (The latter group is perhaps the most meaningful, in my field it is often easier to discuss ML with other people who are doing ML, but it takes a genuine conceptual understanding to explain something to someone new!)

I saw in another review someone mentioned that this book does not get into the math that ML is based on, and though that is somewhat true it lays out the initial groundwork in a beautiful way. Many intro to machine learning websites expect people to have a working vocabulary in machine learning, or to be able to pick up a concept after one sentence of explanation. This book is the vital precursor that ML-spaces have needed for a while. Though the much of the math related to ML and many advanced concepts are not covered here, many of the existing resources that cover them would be much more accessible to a beginner once they have read this book. I would highly recommend this book to anyone trying to learn about machine learning for the first time, anyone who may be trying to refresh their understanding of ML topics they haven't used in awhile, or instructors who are planning on teaching an intro level course.
20 reviews14 followers
July 7, 2024
A legitimately incredible book that explains the most important main ideas of machine learning in an approachable way. Someone said once that if you can’t explain it simply, you don’t really understand it. From linear regression to neural networks, Josh Starmer is the king of simplifying complex issues with easy to understand visual explanations.

This book does a better job explaining the main elements of machine learning than most of my graduate level data science professors and required coursework reading materials.

Data scientists sometimes have a reputation for not being able to explain their findings, what their findings mean, how their findings apply to businesses/orgs, or how they achieved them. Reading this book will not only help you, but also help you explain important elements of how ML works to your org/stakeholders/etc.

Whether you are a beginner or in the middle of learning python or R to build ML models and want to really understand machine learning, get this book.

Profile Image for Leo Vaulin.
13 reviews1 follower
December 17, 2022
Did StatSquatch eat the Table of Contents?

I love watching the StatQuest videos, and this book is a great way to remind yourself of the key concepts without rewatching. My only complaint- the ToC simply says “Page 1”, “Page 2”,… even though there are clearly-marked chapters, as well as a generous appendix with even more Key Concepts.

So what’s up with that?

HINT: I paged through the book and stuck a bookmark at the start of every chapter. StatSquatch won’t win! TRIPLE BAM!!!
6 reviews
March 10, 2023
The author's clear visual style provides a comprehensive look at the basis of machine learning. This book is a good starting point for machine learning beginners, as it contains lots of concrete, easy-to-follow examples with corresponding explanation videos. It is an approachable, practical, and broad introduction to machine learning concepts, and the most beautifully illustrated machine learning book on the market I have ever found (prove me wrong).
2 reviews
December 29, 2024
Good for developing intuition around ML algos. Not enough by itself, it's lacking in mathematical rigor, but that's really not the point of the book.

I used it to refresh my memory on classic ML algos that I learned in school and for that it served its purpose perfectly.
Profile Image for Olivier Chabot.
47 reviews11 followers
June 25, 2022
Great intro to big ideas. Far from being a how to guide for serious programmers. I've learned how to communicate educational content.
1 review
December 2, 2022
Good introduction book with nice pictures and clear explanation. It would be better if more algorithms are covered.
Profile Image for Sabah Shams.
6 reviews1 follower
July 7, 2023
Thoroughly enjoyed the book, just found the same examples of "Loves troll" and "Loves troll 2" to be repetitive and unrealistic.
Profile Image for Mo.
69 reviews
February 29, 2024
Exceptionally visually depicted, and pretty concise. So glad my data science mentor recommended this as a starting point.
23 reviews
June 24, 2024
A beautiful way to learn more about technical concepts mainly through well orchestrated visuals.
7 reviews
August 16, 2024
Fun short book, that gives a great overview of common machine learning topics.
Profile Image for Ben Wooding.
40 reviews
January 16, 2025
Easy reading, perfect for beginners or those looking for a refresh. Perhaps too simplistic with those understanding some machine learning already and wanting to deepen it.
14 reviews1 follower
March 1, 2025
I struggled to start with ML with various courses and YouTube videos. But I think this made me to understand the basic algorithms and have a kickstarter to this world. Excellent beginner book.
Profile Image for Joaquin Menendez.
2 reviews
March 3, 2025
A piece of art blended with the most intuitive lessons. Reccommended for any undergrad or ML newbie.
Profile Image for Xevi.
3 reviews
April 6, 2025
Positively surprised, highly enjoyable read - would definitely recommend it to everyone interested in this field
11 reviews
September 15, 2022
Greatly presented basics for those who are new to the field and need to learn the basic concepts, build basic intuition and not feel inferior and overwhelmed. The author published the book after many years on his YouTube channel, where he regularly posts the same high quality content.

It's short, to the point, funny and really well illustrated.

I wonder how many people in total this person has helped. :))) He helped me for sure.

Of course, it's not aimed at those who have been encountering these concepts since kindergarten and are thinking of possible next neural network constructions while waiting for the tram... ;D
1 review3 followers
June 1, 2024
One of the best books to understand statistics for machine learning
25 reviews
May 12, 2024
Great book for ML beginners, author has a supporting YouTube channel where he elects and explain clearly all the terms and mathematical notions.
Profile Image for Alessandro.
16 reviews26 followers
May 27, 2023
Accessible and entertaining introduction to most common machine learning techniques. I really enjoyed the way the author guided me through the book by using images, repeating, and summarising the key aspects of each technique.
Displaying 1 - 30 of 31 reviews

Can't find what you're looking for?

Get help and learn more about the design.