Jump to ratings and reviews
Rate this book

Think Like a Data Scientist: Tackle the data science process step-by-step

Rate this book
Think Like a Data Scientist presents a step-by-step approach to data science, combining analytic, programming, and business perspectives into easy-to-digest techniques and thought processes for solving real world data-centric problems.


About the Technology

Data collected from customers, scientific measurements, IoT sensors, and so on is valuable only if you understand it. Data scientists revel in the interesting and rewarding challenge of observing, exploring, analyzing, and interpreting this data. Getting started with data science means more than mastering analytic tools and techniques, however; the real magic happens when you begin to think like a data scientist. This book will get you there.


About the Book

Think Like a Data Scientist teaches you a step-by-step approach to solving real-world data-centric problems. By breaking down carefully crafted examples, you'll learn to combine analytic, programming, and business perspectives into a repeatable process for extracting real knowledge from data. As you read, you'll discover (or remember) valuable statistical techniques and explore powerful data science software. More importantly, you'll put this knowledge together using a structured process for data science. When you've finished, you'll have a strong foundation for a lifetime of data science learning and practice.


What's Inside

The data science process, step-by-step
How to anticipate problems
Dealing with uncertainty
Best practices in software and scientific thinking
About the Reader

Readers need beginner programming skills and knowledge of basic statistics.


About the Author

Brian Godsey has worked in software, academia, finance, and defense and has launched several data-centric start-ups.

328 pages, Paperback

Published March 31, 2017

36 people are currently reading
335 people want to read

About the author

Brian Godsey

2 books28 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
21 (22%)
4 stars
40 (42%)
3 stars
27 (28%)
2 stars
6 (6%)
1 star
1 (1%)
Displaying 1 - 12 of 12 reviews
Profile Image for Jorgon.
399 reviews5 followers
April 19, 2017
Pros: its generality.
Cons: its generality.

More seriously, this is a good introduction into the general *practices* of data science, one that does not try to teach you either statistics or programming, but spends much of its time in the vaguely defined but important areas of project design, tool choice, goal specification etc. A good complementary text to something like R for Data Science (or Data Science with R, which shows where MY preferences are).
Profile Image for Klaus-Michael Lux.
55 reviews7 followers
December 5, 2020
Being still quite young, the 21st century hasn’t yet had the time to churn through a great number of hot new professions that thousands flock into. Data Science currently stands in the focus of attention, with institutions all over the world now offering various forms of educational programs that promise an avenue into the “sexiest job of the 21st century”. Being a recent graduate of one of these programs, I know that the field can sometimes seem fuzzy and ill-defined, with no clear core set of skills and ways of conceptualizing problems. This is presumably inherent in many young domains that live somehow at the intersection of existing fields, but to many students, it might still cause a yearning for more structure and order to be imposed. Fear not, because Brian Godsey’s book offers us a conceptual framework for the “lifecycle of data science projects” that I believe can be useful to many young practitioners, nicely condensing insights that would otherwise take years to accumulate in actual practice. Godsey distinguishes three phases:

The first phase begins by setting goals and inspecting what data might be available to solve them. There are some genuinely useful sections that detail for example how to best find out what a customer truly wants and how to uncover hidden assumptions one might have about the data and put them to the test. With many resources on the internet paying little attention to such background issues and usually cutting right to the application of a statistical algorithm to a poorly inspected data set, it’s reassuring to find the preparation phase given proper treatment in this book. After all, as Godsey correctly states, assumptions that go unchecked can come back and haunt you later.

In the second phase, which generally will be the longest, we actually build the data-centric product. Godsey uses this term in place of the term “data-driven” and his rationale actually makes a lot of sense: It’s the data scientist that should be in the driver’s seat, using his understanding of the goals of the project and solving them by using data. He’s surely convinced me and I’ll try to avoid the term, which I always felt was on the verge of vacuous, having been abused by marketing gurus. The building phase starts with the development of a plan, something that the author has relatively little concrete guidance on. He emphasizes the uncertainty inherent in every project and how being a data scientist is also about dealing with this uncertainty by developing backup plans and possibly even starting from scratch when assumptions about the data fall apart. Again, this contrasts nicely with the idealized world of the average Kaggle competition: In the real world, some problems need creativity and critical thinking, more than just iterative tuning of hyperparameters. In the remainder of the section, the author gives some high-level background about statistics and statistical software. Here, the book is general to the point of becoming generic, and I found myself skimming, as much of the content wasn’t new to me at all. Somebody entirely new to Data Science could potentially profit more from this section.

The final third phase deals with wrapping up, describing how to deliver insights and custom code to the customer and how to leverage the insights gathered in future projects. Though I again felt many of the recommendations to be quite obvious, it’s important not to omit this area from the process.

Throughout the book, Godsey uses a number of example projects, some fictitious and some from his own personal experience, to illustrate particular points. These span a range of questions and domains, from analyzing genetic data in bioinformatics to text analytics for a startup hoping to prevent the next Enron case by detecting anomalous behavior in staff emails. These examples are generally to the point and aid the understanding of the particular concepts. While some readers might argue that descriptions of the background of the respective projects might be overly long, I actually found these sections interesting, informing the reader on different types of work environments (e.g. startups versus university labs) and the differences in how analytical work is perceived and conducted there.

“Think like a Data Scientist” is a comprehensive, easy-to-read guide to the process of getting insights from data. While occasionally quite generic, the book nicely ties together different components into a coherent whole, making for good reference reading for people entering the field.
6 reviews
March 26, 2021
This book answers almost all the questions you would ask when starting a Data science project, all the way from planning through execution to delivery and post-delivery.. It's a compelling read , i recommend it to any once starting or currently in a Data science field.
Profile Image for Michal Paszkiewicz.
Author 2 books8 followers
July 24, 2017
A great explanation of Data Science concepts. The author shows maturity in not suggesting that modern machine learning techniques may be an answer for everything and provides good explanations for when and why to use statistical analysis, machine learning and various techniques and patterns. I don't feel I necessarily learnt a lot while reading this book, but it definitely reinforced a lot of the knowledge I've gained from previous books and it gave me a slightly different perspective on how to deal with potential data science problems.

I would highly recommend this book for people with no data science experience that are looking to hire data scientists - this book will give you a broad overview and a good idea of what you want to look for in an employee/consultancy without going into too much technical detail.
Profile Image for numbworks.
22 reviews
March 16, 2019
It covers too many domains of knowledge and that's a con, but it contains some difficult concepts about statistics explained simply.
23 reviews
July 12, 2020
Practical, scenario based, and is generally a light read. Enjoyed reading it end to end. Isnt really a reference so not sure if Ill read it again.
Profile Image for Tim Verstraete.
314 reviews4 followers
September 9, 2017
It was well written and exactly what I was looking for: knowing what a data scientist does ... although I have been doing R&D / Engineer work for a long time including project management and thus already know most planning things, it was interesting to read it from a data scientists point of view.
Profile Image for Lisa Dick.
43 reviews1 follower
March 10, 2022
It's ok. If you have any experience with data science and/or project management it will likely be review. This book would be helpful for someone that it brand new or curious about the field.
613 reviews6 followers
Want to read
May 2, 2023
Uncertainty is an adversary of coldly logical algorithms, and being aware of how thos algorithms might break down in unusual circumstances expedites the process of fixing problems when they occur- and they will occur. A data scientist's main responsibility is to try to imagine all of the possibilities, address the one's that matter , and reevaluate them all as successes and failures happen. That is why- no matter how much code I write- awareness and familiarity with uncertainty are the most valuable things I can offer as a data scientist.
Displaying 1 - 12 of 12 reviews

Can't find what you're looking for?

Get help and learn more about the design.