Rate this book

Practical Statistics for Data Scientists: 50 Essential Concepts

Name: Practical Statistics for Data Scientists: 50 Essential Concepts
Rating: 4.03 (46 reviews)
ISBN: 9781491952962

Peter Bruce, Andrew Bruce

Rate this book

Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

With this book, you'll learn:

Why exploratory data analysis is a key preliminary step in data science
How random sampling can reduce bias and yield a higher quality dataset, even with big data
How the principles of experimental design yield definitive answers to questions
How to use regression to estimate outcomes and detect anomalies
Key classification techniques for predicting which categories a record belongs to
Statistical machine learning methods that "learn" from data
Unsupervised learning methods for extracting meaning from unlabeled data

GenresComputer ScienceMathematicsNonfictionProgrammingScienceReferenceTechnology

315 pages, Paperback

Published June 27, 2017

473 people are currently reading

2390 people want to read

About the author

Peter Bruce

46 books6 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

185 (34%)

4 stars

207 (39%)

3 stars

111 (20%)

2 stars

18 (3%)

1 star

8 (1%)

Displaying 1 - 30 of 46 reviews

Alissa

9 reviews2 followers

August 24, 2017

It's surprisinly short (and that's not a bad thing), objective and didactic. I would recommend this to anyone working with Data Science in any level.

work

Abhi

163 reviews

April 16, 2022

I bought this book along with 14 other books in an online "bundle" sale, and this is the one I picked to read first.

This is a good statistics and ML handbook. You won't find full-length proofs or detailed intuitions/explanations. And that is hardly to be expected; the official documentation for many of the algorithms presented here would each cover half a book this size.

Why I think this is a useful book is that it structures and packs everything in this vast domain of knowledge into bite-sized pieces for you to consume when you need a memory-refresher/quick-reference. The way it achieves this is by covering key terms, basic ideas of concepts, important implementation considerations, along with R code snippets (I was told that there is a second edition out now, with Python code).

It would've been even better if the book had a "Part 2" dealing with neural networks and deep learning.

P.S. If you're already a data science "expert", I don't think there's a lot of new things this book could teach you.

Foxtrot

46 reviews

February 4, 2019

"A quick introduction to Data Science illustrated in R". May have been a better title for this book.

I was misled by the title "Practical Statistics for Data Scientists". I do not think this book is suited for data scientists who are very likely to be already very familiar with all the notions covered in the book. This book is more like a short introduction to data science. If you already have some applied mathematics / statistics background skip this book. Otherwise, you will find yourself skimming through the book.

I will not go in this book for further references since the topics are not especially well explained nor they are tackled in-depth. The term "practical" in the title may suggest it can be used as a reference manual but to me, it does not provide enough content to fulfill this mission.

technical-readings

Ossian Hempel

58 reviews

January 28, 2024

A good introduction to common statistical methods used in Data Science/ML. Found the small 'nugget'-sections in between the main content more useful/original than the main content itself.

Terran M

78 reviews103 followers

April 27, 2018

This book is well written and packs a substantial amount of information into a small number of pages. It is best used to get a survey and overview of many of the facets of the domain of data science. This book will not teach you anything in enough depth to actually execute it well — it will teach you just enough to be dangerous and not realize when you've gone off the rails. I recommend it for managers who may never go into technical depth, for people considering whether or not they are interested in data science, or as a preview book to create a framework from which to hang more detailed understanding. Although this is an introductory book, it assumes you can already program in R. If you can't, either accept that you won't be able to follow the specifics of the examples, or read The Art of R Programming and/or R for Data Science.

I dislike that the authors make a number of categorical statements of the form "Data Scientists do this" or "Data Scientists don't need that". I disagree with many of these assertions and I think they have taken a definition of "data science" which is narrower than the prevailing consensus in the industry.

This book has some errors (see, for example, the confusion matrix on page 196) but overall the accuracy is acceptable relative to recent norms.

Eddie Chen

16 reviews4 followers

January 1, 2021

very good reference book (especially these Data Scientists who picked up most stats concepts on the job and didn't study stats formally in college/post grad)

Bogdan

28 reviews3 followers

December 21, 2022

This book is not fit for someone who is just starting in data science.

It describes a large number of statistics concepts, which are nicely split into ML categories.

I found most the book to be overwhelmingly technical, with the examples being quite vague, as I needed to use other resources to even understand some of the principles.

I might use it again in the future.

Dave Voyles

55 reviews11 followers

September 3, 2019

I've been going through this book as a refresher for statistics and ML terms for the last couple of days, and found it to be great for that, or someone looking to begin with statistics and data modeling

Wei Cui

18 reviews

November 28, 2019

This books explains concepts in statistics clearly with great examples. Personally, I was able to grasp a few algorithms fully for the first time (for example, multi-arm bandit, Permutation tests, Chi-square test). It is a short book, but it contains comprehensive overview of key algorithms useful for data scientists, including fairly advanced ones. Another plus is that it has examples in R code so that you can also get hands-on experience by trying out the example scripts.

My recommendation is to read 3 concepts (overall there are 50 essential concepts in the book) each time, reflect on how you can apply them to your area, and once you finish all concepts, keep this book as a reference book.

Yuriy Zubarev

24 reviews

December 1, 2017

This is quite an indispensable reference in today's torrent of books and articles on data science.

The authors chose foundational concepts, introduced the vocabulary for each concept, briefly explained the math and context, connected data science concept to classical statistics concepts, and provided a list of resources for further follow-up.

I will be coming back to this book more often than doing Google searches on the topic of data science.

Ray

45 reviews5 followers

March 17, 2018

This was a somewhat useful high-level overview, but to me, the brevity and breadth did not provide enough value that I would recommend this over one or more texts containing more detail. The most useful section for me was that on the bootstrap and bagging.

Ilia

33 reviews1 follower

March 6, 2019

A nice brief overview or recap of statistics and machine learning

This is an excellent book for either repeating key concepts of statistics and machine learning or for closing some gaps one might have. There are plenty of references to other sources to dig further.

Robin Ver

16 reviews1 follower

October 10, 2020

Clear explanations in separate pieces. Does however return very often to: this math stuff is not necessary for data science.

in-kast studie

Sivathanu

5 reviews

September 3, 2018

Simple practical guide.

Mike Martos

134 reviews

September 13, 2018

Really liked the book, easy to read and good examples, perhaps the only complaint is that it has only R examples, and you don't get that from the title, would've like to see some Python too.

data-science non-fiction

Andrés

54 reviews16 followers

September 28, 2018

Excellent book until Chapter 5. From that chapter on, explanations are too hasty for my taste, very dense and less didactic overall, specially compared to the first part of the book.

Kelly

20 reviews2 followers

January 23, 2019

Great, simple explanations of concepts. Good reference for students.

Mlv Prasad

19 reviews

February 4, 2020

An overall statistical science overview

This entire review has been hidden because of spoilers.

Ralph Quirequire

18 reviews1 follower

December 28, 2019

I read this book on and off as a refresher to what I learned from grad school and my self-exploration of data science. Although most of the topics are a bit familiar to me already, the book still brings some fresh perspectives and insights--especially on helping gain a solid (step-by-step) grasp of common algorithms and models in the data science toolkit. Also, while the book exclusively focuses on code examples/applications in R, it does a great job of explaining the underlying concepts (and mentions handy resources for more in-depth study) that doing the computations/modeling in Python won't be a problem.

I'm now in a whole new different field with my Python development career. Still, I think this book will serve as a good reference (along with others) whenever I find myself working in a data science project.

Paula

155 reviews5 followers

February 17, 2022

I have a degree in statistics so wanted to see if there were any concepts that I was not familiar with that were relevant for data science. It covers the basics of statistics so for me it wasn't interesting as I already was familiar with the concepts. I would think this is more suited to people with minimal exposure to statistics. Topics include:
• Sampling methods, Selection Bias.
• Significance Testing such as t-Tests, F-Statistic, Chi-Square Test, Fisher's Exact Test.
• Classification algorithms such as Naive Bayes, Logistic Regression, and Discriminant Analysis.
• Regression and Prediction, Confounding Variables, Outliers, and Correlation.
• Unsupervised Learning such as K-Means Clustering, Hierarchical Clustering
• Statistical Machine Learning such as K-Nearest Neighbor, Tree models, Bagging, and Boosting.

data-science statistics

Anirudh Jain

132 reviews2 followers

August 19, 2020

Though the book says it is just a reference and not a complete source for statistics but the amount of information given along with the practical real life data science scenarios makes it a more than enough for beginners. The language is very lucid, though the examples are in R and I am learning python the graphical representation makes up for it. The machine learning part which is in the second half of the book is also very useful for building the intuition for algorithms.

All in all a great comprehensive book for statistics and machine learning.

Alex

1 review

January 17, 2024

A really good book for explaining a collection of basic stats ideas behind data science and machine learning methods. I especially like their easy to understand stat test scores explanations. In general though, once you get past the statistic focused chapters, their coverage on ML is kinda weak and superficial and not very in-depth / insightful compared to other classic books that only focuses on ML. It’s okay since this is intended to be a stats and data science book. Overall a good book, giving 4 stars considering their ML coverage.

Michael

85 reviews17 followers

July 20, 2018

This book is an excellent introduction to basic statistical methods used for data science. As some other reviewers have mentioned, I found the R code to be of comparably little interest and the unsupervised learning chapter was a little lean for my tastes. All in all, a lot of good material here, though.

John

25 reviews11 followers

August 18, 2018

Pretty good overview of stats in the context of data science. . .as someone whose study of stats predates much of more modern data science seeing these concepts again through this lens is helpful as some concepts are now more/less useful than they were back in the day and knowing which is which is good.

Saganaut

3 reviews

January 14, 2020

Very succinct view of stats, often simply ignores the underlying math. While avoiding the pitfalls of overly-complex notation, it sometimes loses the critical intuition about the actual techniques and models. It does provide good coverage of a variety of techniques. I would recommend this for an absolute beginner in data science as it is quite easy to read.

Bárbara

65 reviews1 follower

January 2, 2025

Understood a lot of the basic of data science. Keep in mind this is just an introduction, it will not go deep in the subjects. I would say this is the basic knowledge a person aspiring to be a data scientist should have. Also, the book guides to code in R, but I used chatGPT do adapt the codes and investigate further. It added a lot to my knowledged, I recommend it.

Muhammed Buyukkinaci

74 reviews

December 15, 2019

Taught me some concepts of statistics. I made a Python implementation of some topics in the book that I found useful.
https://github.com/MuhammedBuyukkinac...

Hadiana Sliwa

67 reviews8 followers

September 11, 2023

This book was my first data-related book and it opened my eyes to the world of data, it's very clear and easy to read I googled subjects from the book but overall it was easy to follow and it helped me a lot.

Shane Simon

12 reviews1 follower

May 18, 2024

I genuinely appreciated the practical anecdotes of not permitting statistical significance get in the way of achieving your data science goals. I’m not the biggest fan of R, but this book would be the perfect transition point for anyone switching from R to Python.