Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.
Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.
With this book, you'll learn:
Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that "learn" from data Unsupervised learning methods for extracting meaning from unlabeled data
I bought this book along with 14 other books in an online "bundle" sale, and this is the one I picked to read first.
This is a good statistics and ML handbook. You won't find full-length proofs or detailed intuitions/explanations. And that is hardly to be expected; the official documentation for many of the algorithms presented here would each cover half a book this size.
Why I think this is a useful book is that it structures and packs everything in this vast domain of knowledge into bite-sized pieces for you to consume when you need a memory-refresher/quick-reference. The way it achieves this is by covering key terms, basic ideas of concepts, important implementation considerations, along with R code snippets (I was told that there is a second edition out now, with Python code).
It would've been even better if the book had a "Part 2" dealing with neural networks and deep learning.
P.S. If you're already a data science "expert", I don't think there's a lot of new things this book could teach you.
"A quick introduction to Data Science illustrated in R". May have been a better title for this book.
I was misled by the title "Practical Statistics for Data Scientists". I do not think this book is suited for data scientists who are very likely to be already very familiar with all the notions covered in the book. This book is more like a short introduction to data science. If you already have some applied mathematics / statistics background skip this book. Otherwise, you will find yourself skimming through the book.
I will not go in this book for further references since the topics are not especially well explained nor they are tackled in-depth. The term "practical" in the title may suggest it can be used as a reference manual but to me, it does not provide enough content to fulfill this mission.
A good introduction to common statistical methods used in Data Science/ML. Found the small 'nugget'-sections in between the main content more useful/original than the main content itself.
This book is well written and packs a substantial amount of information into a small number of pages. It is best used to get a survey and overview of many of the facets of the domain of data science. This book will not teach you anything in enough depth to actually execute it well — it will teach you just enough to be dangerous and not realize when you've gone off the rails. I recommend it for managers who may never go into technical depth, for people considering whether or not they are interested in data science, or as a preview book to create a framework from which to hang more detailed understanding. Although this is an introductory book, it assumes you can already program in R. If you can't, either accept that you won't be able to follow the specifics of the examples, or read The Art of R Programming and/or R for Data Science.
I dislike that the authors make a number of categorical statements of the form "Data Scientists do this" or "Data Scientists don't need that". I disagree with many of these assertions and I think they have taken a definition of "data science" which is narrower than the prevailing consensus in the industry.
This book has some errors (see, for example, the confusion matrix on page 196) but overall the accuracy is acceptable relative to recent norms.
very good reference book (especially these Data Scientists who picked up most stats concepts on the job and didn't study stats formally in college/post grad)
This book is not fit for someone who is just starting in data science.
It describes a large number of statistics concepts, which are nicely split into ML categories.
I found most the book to be overwhelmingly technical, with the examples being quite vague, as I needed to use other resources to even understand some of the principles.
I've been going through this book as a refresher for statistics and ML terms for the last couple of days, and found it to be great for that, or someone looking to begin with statistics and data modeling
This books explains concepts in statistics clearly with great examples. Personally, I was able to grasp a few algorithms fully for the first time (for example, multi-arm bandit, Permutation tests, Chi-square test). It is a short book, but it contains comprehensive overview of key algorithms useful for data scientists, including fairly advanced ones. Another plus is that it has examples in R code so that you can also get hands-on experience by trying out the example scripts.
My recommendation is to read 3 concepts (overall there are 50 essential concepts in the book) each time, reflect on how you can apply them to your area, and once you finish all concepts, keep this book as a reference book.
This is quite an indispensable reference in today's torrent of books and articles on data science.
The authors chose foundational concepts, introduced the vocabulary for each concept, briefly explained the math and context, connected data science concept to classical statistics concepts, and provided a list of resources for further follow-up.
I will be coming back to this book more often than doing Google searches on the topic of data science.
This was a somewhat useful high-level overview, but to me, the brevity and breadth did not provide enough value that I would recommend this over one or more texts containing more detail. The most useful section for me was that on the bootstrap and bagging.
A nice brief overview or recap of statistics and machine learning
This is an excellent book for either repeating key concepts of statistics and machine learning or for closing some gaps one might have. There are plenty of references to other sources to dig further.
Really liked the book, easy to read and good examples, perhaps the only complaint is that it has only R examples, and you don't get that from the title, would've like to see some Python too.
Excellent book until Chapter 5. From that chapter on, explanations are too hasty for my taste, very dense and less didactic overall, specially compared to the first part of the book.
I read this book on and off as a refresher to what I learned from grad school and my self-exploration of data science. Although most of the topics are a bit familiar to me already, the book still brings some fresh perspectives and insights--especially on helping gain a solid (step-by-step) grasp of common algorithms and models in the data science toolkit. Also, while the book exclusively focuses on code examples/applications in R, it does a great job of explaining the underlying concepts (and mentions handy resources for more in-depth study) that doing the computations/modeling in Python won't be a problem.
I'm now in a whole new different field with my Python development career. Still, I think this book will serve as a good reference (along with others) whenever I find myself working in a data science project.
I have a degree in statistics so wanted to see if there were any concepts that I was not familiar with that were relevant for data science. It covers the basics of statistics so for me it wasn't interesting as I already was familiar with the concepts. I would think this is more suited to people with minimal exposure to statistics. Topics include: • Sampling methods, Selection Bias. • Significance Testing such as t-Tests, F-Statistic, Chi-Square Test, Fisher's Exact Test. • Classification algorithms such as Naive Bayes, Logistic Regression, and Discriminant Analysis. • Regression and Prediction, Confounding Variables, Outliers, and Correlation. • Unsupervised Learning such as K-Means Clustering, Hierarchical Clustering • Statistical Machine Learning such as K-Nearest Neighbor, Tree models, Bagging, and Boosting.
Though the book says it is just a reference and not a complete source for statistics but the amount of information given along with the practical real life data science scenarios makes it a more than enough for beginners. The language is very lucid, though the examples are in R and I am learning python the graphical representation makes up for it. The machine learning part which is in the second half of the book is also very useful for building the intuition for algorithms.
All in all a great comprehensive book for statistics and machine learning.
A really good book for explaining a collection of basic stats ideas behind data science and machine learning methods. I especially like their easy to understand stat test scores explanations. In general though, once you get past the statistic focused chapters, their coverage on ML is kinda weak and superficial and not very in-depth / insightful compared to other classic books that only focuses on ML. It’s okay since this is intended to be a stats and data science book. Overall a good book, giving 4 stars considering their ML coverage.
This book is an excellent introduction to basic statistical methods used for data science. As some other reviewers have mentioned, I found the R code to be of comparably little interest and the unsupervised learning chapter was a little lean for my tastes. All in all, a lot of good material here, though.
Pretty good overview of stats in the context of data science. . .as someone whose study of stats predates much of more modern data science seeing these concepts again through this lens is helpful as some concepts are now more/less useful than they were back in the day and knowing which is which is good.
Very succinct view of stats, often simply ignores the underlying math. While avoiding the pitfalls of overly-complex notation, it sometimes loses the critical intuition about the actual techniques and models. It does provide good coverage of a variety of techniques. I would recommend this for an absolute beginner in data science as it is quite easy to read.
Understood a lot of the basic of data science. Keep in mind this is just an introduction, it will not go deep in the subjects. I would say this is the basic knowledge a person aspiring to be a data scientist should have. Also, the book guides to code in R, but I used chatGPT do adapt the codes and investigate further. It added a lot to my knowledged, I recommend it.
This book was my first data-related book and it opened my eyes to the world of data, it's very clear and easy to read I googled subjects from the book but overall it was easy to follow and it helped me a lot.
I genuinely appreciated the practical anecdotes of not permitting statistical significance get in the way of achieving your data science goals. I’m not the biggest fan of R, but this book would be the perfect transition point for anyone switching from R to Python.