Jump to ratings and reviews
Rate this book

Practical Statistics for Data Scientists: 50 Essential Concepts

Rate this book
Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

With this book, you'll learn:


Why exploratory data analysis is a key preliminary step in data science
How random sampling can reduce bias and yield a higher quality dataset, even with big data
How the principles of experimental design yield definitive answers to questions
How to use regression to estimate outcomes and detect anomalies
Key classification techniques for predicting which categories a record belongs to
Statistical machine learning methods that "learn" from data
Unsupervised learning methods for extracting meaning from unlabeled data

315 pages, Paperback

Published June 27, 2017

473 people are currently reading
2390 people want to read

About the author

Peter Bruce

46 books6 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
185 (34%)
4 stars
207 (39%)
3 stars
111 (20%)
2 stars
18 (3%)
1 star
8 (1%)
Displaying 1 - 30 of 46 reviews
Profile Image for Alissa.
9 reviews2 followers
August 24, 2017
It's surprisinly short (and that's not a bad thing), objective and didactic. I would recommend this to anyone working with Data Science in any level.
Profile Image for Abhi.
163 reviews
April 16, 2022
I bought this book along with 14 other books in an online "bundle" sale, and this is the one I picked to read first.

This is a good statistics and ML handbook. You won't find full-length proofs or detailed intuitions/explanations. And that is hardly to be expected; the official documentation for many of the algorithms presented here would each cover half a book this size.

Why I think this is a useful book is that it structures and packs everything in this vast domain of knowledge into bite-sized pieces for you to consume when you need a memory-refresher/quick-reference. The way it achieves this is by covering key terms, basic ideas of concepts, important implementation considerations, along with R code snippets (I was told that there is a second edition out now, with Python code).

It would've been even better if the book had a "Part 2" dealing with neural networks and deep learning.

P.S. If you're already a data science "expert", I don't think there's a lot of new things this book could teach you.
Profile Image for Foxtrot.
46 reviews
February 4, 2019
"A quick introduction to Data Science illustrated in R". May have been a better title for this book.

I was misled by the title "Practical Statistics for Data Scientists". I do not think this book is suited for data scientists who are very likely to be already very familiar with all the notions covered in the book. This book is more like a short introduction to data science. If you already have some applied mathematics / statistics background skip this book. Otherwise, you will find yourself skimming through the book.

I will not go in this book for further references since the topics are not especially well explained nor they are tackled in-depth. The term "practical" in the title may suggest it can be used as a reference manual but to me, it does not provide enough content to fulfill this mission.
Profile Image for Ossian Hempel.
58 reviews
January 28, 2024
A good introduction to common statistical methods used in Data Science/ML. Found the small 'nugget'-sections in between the main content more useful/original than the main content itself.
Profile Image for Terran M.
78 reviews103 followers
April 27, 2018
This book is well written and packs a substantial amount of information into a small number of pages. It is best used to get a survey and overview of many of the facets of the domain of data science. This book will not teach you anything in enough depth to actually execute it well — it will teach you just enough to be dangerous and not realize when you've gone off the rails. I recommend it for managers who may never go into technical depth, for people considering whether or not they are interested in data science, or as a preview book to create a framework from which to hang more detailed understanding. Although this is an introductory book, it assumes you can already program in R. If you can't, either accept that you won't be able to follow the specifics of the examples, or read The Art of R Programming and/or R for Data Science.

I dislike that the authors make a number of categorical statements of the form "Data Scientists do this" or "Data Scientists don't need that". I disagree with many of these assertions and I think they have taken a definition of "data science" which is narrower than the prevailing consensus in the industry.

This book has some errors (see, for example, the confusion matrix on page 196) but overall the accuracy is acceptable relative to recent norms.
Profile Image for Eddie Chen.
16 reviews4 followers
January 1, 2021
very good reference book (especially these Data Scientists who picked up most stats concepts on the job and didn't study stats formally in college/post grad)
28 reviews3 followers
December 21, 2022
This book is not fit for someone who is just starting in data science.

It describes a large number of statistics concepts, which are nicely split into ML categories.

I found most the book to be overwhelmingly technical, with the examples being quite vague, as I needed to use other resources to even understand some of the principles.

I might use it again in the future.
Profile Image for Dave Voyles.
55 reviews11 followers
September 3, 2019
I've been going through this book as a refresher for statistics and ML terms for the last couple of days, and found it to be great for that, or someone looking to begin with statistics and data modeling
Profile Image for Wei Cui.
18 reviews
November 28, 2019
This books explains concepts in statistics clearly with great examples. Personally, I was able to grasp a few algorithms fully for the first time (for example, multi-arm bandit, Permutation tests, Chi-square test). It is a short book, but it contains comprehensive overview of key algorithms useful for data scientists, including fairly advanced ones. Another plus is that it has examples in R code so that you can also get hands-on experience by trying out the example scripts.

My recommendation is to read 3 concepts (overall there are 50 essential concepts in the book) each time, reflect on how you can apply them to your area, and once you finish all concepts, keep this book as a reference book.
Profile Image for Yuriy Zubarev.
24 reviews
December 1, 2017
This is quite an indispensable reference in today's torrent of books and articles on data science.

The authors chose foundational concepts, introduced the vocabulary for each concept, briefly explained the math and context, connected data science concept to classical statistics concepts, and provided a list of resources for further follow-up.

I will be coming back to this book more often than doing Google searches on the topic of data science.
Profile Image for Ray.
45 reviews5 followers
March 17, 2018
This was a somewhat useful high-level overview, but to me, the brevity and breadth did not provide enough value that I would recommend this over one or more texts containing more detail. The most useful section for me was that on the bootstrap and bagging.
Profile Image for Ilia.
33 reviews1 follower
March 6, 2019
A nice brief overview or recap of statistics and machine learning

This is an excellent book for either repeating key concepts of statistics and machine learning or for closing some gaps one might have. There are plenty of references to other sources to dig further.
16 reviews1 follower
October 10, 2020
Clear explanations in separate pieces. Does however return very often to: this math stuff is not necessary for data science.
Profile Image for Mike Martos.
134 reviews
September 13, 2018
Really liked the book, easy to read and good examples, perhaps the only complaint is that it has only R examples, and you don't get that from the title, would've like to see some Python too.
Profile Image for Andrés.
54 reviews16 followers
September 28, 2018
Excellent book until Chapter 5. From that chapter on, explanations are too hasty for my taste, very dense and less didactic overall, specially compared to the first part of the book.
20 reviews2 followers
January 23, 2019
Great, simple explanations of concepts. Good reference for students.
Profile Image for Mlv Prasad.
19 reviews
February 4, 2020
An overall statistical science overview
This entire review has been hidden because of spoilers.
Profile Image for Ralph Quirequire.
18 reviews1 follower
December 28, 2019
I read this book on and off as a refresher to what I learned from grad school and my self-exploration of data science. Although most of the topics are a bit familiar to me already, the book still brings some fresh perspectives and insights--especially on helping gain a solid (step-by-step) grasp of common algorithms and models in the data science toolkit. Also, while the book exclusively focuses on code examples/applications in R, it does a great job of explaining the underlying concepts (and mentions handy resources for more in-depth study) that doing the computations/modeling in Python won't be a problem.

I'm now in a whole new different field with my Python development career. Still, I think this book will serve as a good reference (along with others) whenever I find myself working in a data science project.
Profile Image for Paula.
155 reviews5 followers
February 17, 2022
I have a degree in statistics so wanted to see if there were any concepts that I was not familiar with that were relevant for data science. It covers the basics of statistics so for me it wasn't interesting as I already was familiar with the concepts. I would think this is more suited to people with minimal exposure to statistics. Topics include:
• Sampling methods, Selection Bias.
• Significance Testing such as t-Tests, F-Statistic, Chi-Square Test, Fisher's Exact Test.
• Classification algorithms such as Naive Bayes, Logistic Regression, and Discriminant Analysis.
• Regression and Prediction, Confounding Variables, Outliers, and Correlation.
• Unsupervised Learning such as K-Means Clustering, Hierarchical Clustering
• Statistical Machine Learning such as K-Nearest Neighbor, Tree models, Bagging, and Boosting.
Profile Image for Anirudh Jain.
132 reviews2 followers
August 19, 2020
Though the book says it is just a reference and not a complete source for statistics but the amount of information given along with the practical real life data science scenarios makes it a more than enough for beginners. The language is very lucid, though the examples are in R and I am learning python the graphical representation makes up for it. The machine learning part which is in the second half of the book is also very useful for building the intuition for algorithms.

All in all a great comprehensive book for statistics and machine learning.
Profile Image for Alex.
1 review
January 17, 2024
A really good book for explaining a collection of basic stats ideas behind data science and machine learning methods. I especially like their easy to understand stat test scores explanations. In general though, once you get past the statistic focused chapters, their coverage on ML is kinda weak and superficial and not very in-depth / insightful compared to other classic books that only focuses on ML. It’s okay since this is intended to be a stats and data science book. Overall a good book, giving 4 stars considering their ML coverage.
85 reviews17 followers
July 20, 2018
This book is an excellent introduction to basic statistical methods used for data science. As some other reviewers have mentioned, I found the R code to be of comparably little interest and the unsupervised learning chapter was a little lean for my tastes. All in all, a lot of good material here, though.
Profile Image for John.
25 reviews11 followers
August 18, 2018
Pretty good overview of stats in the context of data science. . .as someone whose study of stats predates much of more modern data science seeing these concepts again through this lens is helpful as some concepts are now more/less useful than they were back in the day and knowing which is which is good.
3 reviews
January 14, 2020
Very succinct view of stats, often simply ignores the underlying math. While avoiding the pitfalls of overly-complex notation, it sometimes loses the critical intuition about the actual techniques and models. It does provide good coverage of a variety of techniques. I would recommend this for an absolute beginner in data science as it is quite easy to read.
Profile Image for Bárbara.
65 reviews1 follower
January 2, 2025
Understood a lot of the basic of data science. Keep in mind this is just an introduction, it will not go deep in the subjects. I would say this is the basic knowledge a person aspiring to be a data scientist should have. Also, the book guides to code in R, but I used chatGPT do adapt the codes and investigate further. It added a lot to my knowledged, I recommend it.
Profile Image for Hadiana Sliwa.
67 reviews8 followers
September 11, 2023
This book was my first data-related book and it opened my eyes to the world of data, it's very clear and easy to read I googled subjects from the book but overall it was easy to follow and it helped me a lot.
Profile Image for Shane Simon.
12 reviews1 follower
May 18, 2024
I genuinely appreciated the practical anecdotes of not permitting statistical significance get in the way of achieving your data science goals. I’m not the biggest fan of R, but this book would be the perfect transition point for anyone switching from R to Python.
Displaying 1 - 30 of 46 reviews

Can't find what you're looking for?

Get help and learn more about the design.