Jump to ratings and reviews
Rate this book

Python Data Science Handbook: Essential Tools for Working with Data

Rate this book
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.

Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.

With this handbook, you’ll learn how to use:
* IPython and Jupyter: provide computational environments for data scientists using Python
* NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python
* Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python
* Matplotlib: includes capabilities for a flexible range of data visualizations in Python
* Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms

541 pages, Kindle Edition

First published March 25, 2016

471 people are currently reading
2115 people want to read

About the author

Jake VanderPlas

3 books20 followers
Jake VanderPlas is a well-known data scientist, researcher, and educator. He is recognized for his contributions to the fields of machine learning, data science, and astronomy. VanderPlas is particularly famous for his work in Python programming for data analysis and scientific computing.
He is the author of several popular resources, including:
"Python Data Science Handbook": A highly regarded book that provides a comprehensive guide to data science with Python, covering topics such as data manipulation, visualization, and machine learning.
He has also contributed to open-source projects related to data science and scientific computing, particularly within the Python ecosystem.
In addition to his work in data science, Jake VanderPlas is also an academic with a background in astronomy. He has worked at the University of Washington and has been involved in research related to both machine learning applications and astrophysics.

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
309 (46%)
4 stars
256 (38%)
3 stars
80 (12%)
2 stars
12 (1%)
1 star
3 (<1%)
Displaying 1 - 30 of 63 reviews
Profile Image for Terran M.
78 reviews103 followers
October 20, 2018
This book is not as good as R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, but if you are constrained or committed to using Python, it is the best available alternative as of 2018. Chapters 1 through 3 on ipython, Numpy, and Pandas are very well written, although they do suffer from using mostly small, made-up examples. Chapter 4 on Matplotlib is disappointing, but that's because Matplotlib is itself a weak and obsolete tool; the book acknowledges that fact and cannot fix it. I do not care for Chapter 5, which attempts too much and delivers too little (for example, the “in depth” treatment of linear regression is all of 2 pages). I suggest that you stop at the end of Chapter 4 and instead move on to Introduction to Machine Learning with Python: A Guide for Data Scientists.

As an alternative to this book, also consider Python for Data Analysis by Wes McKinney, which includes more verbose coverage of Pandas, at the expense of removing the ML section that you probably don't want to read anyway. The two are about equally good and share the same strengths (good writing) and weaknesses (dry references with mostly made-up data in the example, and use of Matplotlib for graphics).

Update in late 2018: I now recommend Altair as the best native-Python graphics library, or plotnine, a clone of ggplot. Either way, you should skip most of Chapter 4 on matplotlib and learn one of these other libraries instead.
Profile Image for Jeremy.
48 reviews
January 6, 2019
The book is written as a Jupyter notebook, and is available for free on GitHub:
https://github.com/jakevdp/PythonData...

Books written as Jupyter Notebooks are simply wonderful. They should become the default medium for learning new materials related to computer science and mathematics.

Regarding the book itself, it fits more in the "practical knowledge" category, which is totally fine since it's a handbook. Being exposed to the different methods and tools is great. There is however no real theoretical explanations behind the tools themselves or details about their implementation, but the reader can freely refer to extra materials if needed.
Profile Image for Mikkel Hansen.
9 reviews10 followers
August 20, 2018
I read this book after having worked as a data scientist for about a year and a half. Most of my work had focused on machine learning, so I had picked up Numpy, Pandas, and Matplotlib along the way. This approach left some glaring holes in my usage of these modules. After having read this book I can see that there has been a couple of things I have been doing wrong -- or at least very ineffectively. So reading this book was definitely a good idea.

I especially appreciated the chapters on Numpy and Pandas (~180 pages). Particularly the proper usage of indexing (eg. timestamps as indices) and multi-indexing for hierarchal structure. Both chapters also contain advice on how to speed up the code when needed.

Generally, I really liked this book and will definitely add it to our library at work so I can reference it and lend it to our students and interns.
Profile Image for Moeen Sahraei.
30 reviews56 followers
November 10, 2020
It’s a succinct and well written book in data science using python, one of its greatest weaknesses is its examples, the author didn’t relate subjects with examples well and they are too hard to understand. But in a nutshell, it’s a good book for learning the basics of numpy, pandas, matplotlib and a little bit of machine learning
Profile Image for Ye Lin Kyaw.
16 reviews7 followers
May 7, 2018
It is broad and deep enough for the beginners and experienced users who migrate from other platforms
Profile Image for James Mason.
558 reviews20 followers
October 3, 2017
Extremely well written. Just the right level of depth. It was useful to work through bit by bit to gain a general understanding and practice, and I'm sure it will also be useful as a desktop reference. I was inspired throughout to look at my data in new ways and apply new, modern methods to the data in order to obtain more robust results and hopefully uncover things about it that I simply would not have otherwise. Most of that happened in the machine learning (final) chapter. I appreciated the attention to aesthetics in visualizations in earlier chapters, especially the one on matplotlib. And I also really appreciated the first chapter on IPython and the various ways you can write your code, though I wish it had a little more breadth in terms of the available options and justifications for why you might use, e.g., Jupyter notebooks as opposed to Atom/Ipython console. I also wish that there were more astronomy examples since that is the author's and my area of study. Despite those minor qualms, 5 stars!

Note that the goodreads subtitle is incorrect. It should be: Essential Tools for Working with Data.
Profile Image for Gabri.
243 reviews4 followers
March 23, 2019
Mandatory read, did not finish around 50%.

So I'm in my final year of Information Studies and I feel like it wasn't until I read this book that I truly understood computer programming. It covers very useful packages for Data Science (Numpy, Pandas, Matplotlib), and not only explains what the code does, but also provides many code examples that help you to understand it and use it on your own.

I would highly recommend this book to anyone who has some basic knowledge of Python but wants/needs to be able to understand and execute the process of Data Science.
27 reviews
September 25, 2022
Definitely a reference for beginners. This book is very (very) PRACTICAL.

The first chapters are about IPython, Pandas, Numpy and Matplotlib. The author explains clearly and quickly all you need to do proper data science works. Machine Learning part is not mathematical at all so this is the same spirit as the rest of the book.

This book is not special about the level of Python programming - it's very basic - or the depth of each topics. This is a comprehensive presentation of data science tools.

I will advise this book to my students, it's pure gold.
Profile Image for Oleg Shevelyov.
Author 1 book12 followers
February 13, 2020
Very good book. Covers many important tools (IPython, Numpy, Pandas, Scikit-Learn) for applied Data Science in Python and breaks them down into logical chunks.
Profile Image for mhy.
11 reviews
October 13, 2019
Buku bagus sebagai pengantar terkait pemanfaatan bahasa pemrograman Python pada bidang data science. Buku ini menjelaskan teori dan implementasi pemogramannya yang menggunakan package atau fungsi yang sudah didevelop oleh komunitas developer.
Profile Image for Giulio Ciacchini.
370 reviews12 followers
July 27, 2022
Unfortunately given the insane pace at which technology progresses some parts of the book are a little outdated. For instance ipython is actually deprecated replaced by Jupiter notebooks.

Aside from that this textbook is still very much relevant, it starts from the basic of computer programming to the three main packages to handle Python (Pandas, Numpy and Matplotlib).

However the machine learning section is a bit lacking and it focused on less important algorithms.

At last, since this is supposed to be a Python Handbook, I was surprised that the fundamentals of OOP (Object-Oriented Programming) were never covered, despite the essential and extensive use of classes and functions in the text.
Profile Image for Matt Heavner.
1,101 reviews14 followers
May 10, 2017
The python data science handbook is the best python tutorial I have read. It is "an overview of python if you want to be a data scientist" - the breadth and depth on specific tools (matplotlib & beyond, pandas, and sci-kit, as well as ipython & jupyter notebooks) is perfect for a data science application. This is definitely addressing the "computer skills" third of the data science Venn diagram (not much on mathematics or subject matter expertise). Recommended for learning python or having as a reference.
Profile Image for Hays Hutton.
9 reviews1 follower
October 23, 2015
Liked how it goes in depth into NumPy and then Pandas. Sometimes a "little" too API based but that makes it practical in some respects.
Profile Image for Isen.
260 reviews4 followers
July 10, 2023
The Python data science handbook walks the reader through the Python datastack (IPython, NumPy, Pandas, Matplotlib, Scikit). A chapter looks at some aspect such as array broadcasting or hierarchical indices, presents a few examples in code, and often ends with a more involved case study like looking at the effect of weather on Seattle cyclists. It's not a bad book, but unfortunately it just does not actually appear to be a book for anyone.

If you're interested in the inner workings of the Python datastack and want to up your coding game, this book is not for you. One of the trickier aspects of NumPy to get your head around is that sometimes an operation gives you a view of an array, through which you can change the original array, and sometimes a copy of an array, leaving the original unchanged. Knowing which you're going to get is vital to avoid introducing exotic bugs into your program, and you would hope that a book about NumPy would give you some clarification. Instead the most you get is the extremely useful assertion that the reshape method uses a view "where possible", and the topic does not arise again after that.

If you're interested in improving as a data scientist, it's not much better. You might be able to pick up a few tricks by looking at the author's way of doing things, but since the techniques are presented ad hoc to solve a particular problem and there are no exercises, if you don't happen to be working on a very similar problem you're probably just going to say "neat" and forget it.

If you're looking for a collection of tutorials? This is it, I guess. But you could also type "How do I ___ in numpy" into Google for the same result.

So, yeah. You could read it, I guess. Or you could not. Whatever.
Profile Image for Sebastian.
191 reviews9 followers
July 9, 2019
A rigorous overview of data science tools in Python, combined with an introduction to several machine learning techniques using the sci-kit learn library.

As someone that has approached learning data science and programming on a project-by-project basis, it was wonderfully enlightening to see the author dive deep into the syntax, and reasoning behind libraries such as NumPy, Pandas, and Matplotlib. The chapter on machine learning is surprisingly hefty considering how much has come prior to it.

I read this book for free on the author's GitHub however I will be going back and purchasing it, as it truly is a handbook. I have already gone back and referred to work in this book on several projects, and I know that I'll be using it in the future to flick through to refresh my ideas, or think about how I would structure my own code.
Profile Image for Samuel.
49 reviews6 followers
August 5, 2021
I'm using this book to dive into subtle topics in Pandas, such as Hierarchical indexing, and it works wonders for that! The explanations are stellar, often starting with an example of how one would have gone about e.g. a 2-dimensional data without a MultiIndex, in order to really understand why it needed, which really helps understand the rationale for the different parts. It then goes on to explain even advanced topics step by step in a very clear and to-the-point way, which works really really well.
Profile Image for Nickolai.
889 reviews8 followers
October 22, 2022
В августе 2022 прочитал по подписке на O'Reilly второе издание, которое выйдет только в декабре. По содержанию, книга практически не изменилась, пропала только глава про машинное обучение (возможно будет добавлена позже). Кроме того, в электронном варианте на O'Reilly невозможно читать главу про matplotlib, так как рисунки не соответствуют материалу. Это недостаток, несомненно, будет устранен при отправке книги в печать. В остальном же полностью подтверждаю свой обзор, написанный на первое издание.
12 reviews
June 28, 2024
My strongest recommendation is for the FIRST edition - best starting point for anyone who wants to get started with Python, specifically, for data science. Especially so if you have pursued statistics, econometrics, data science in other programming languages. Will save you a lot of time as it will filter out the inessentials for machine learning.

I do have a pet peeve - the 2024 second edition of this favorite book of mine is a replica of the first. No new content, therefore the latest innovative deep-learning Python modules such as deep learning - Tensor Flow or Pytorch are not covered. Makes this book outdated even as it leaves the Press. Have no idea why the second was published.

I would stick to buying the first.
Profile Image for Ray.
45 reviews5 followers
July 22, 2018
This is really an amazing technical resource. Vanderplas manages to keep his content extraordinarily practical and grounded, without being irreverent to the theory like so many lower-quality modern data science texts are. As a contributor to the Python data software libraries such as Scikit-learn, the author is eminently qualified to give a tour of their inner workings. Finally, the book is self-aware of where it lacks depth, and does an excellent job in referring readers to further resources.
Profile Image for Hadiana Sliwa.
67 reviews8 followers
November 13, 2020
As a starter, new to python the first four chapters of the book were very easy to follow, I learned too much from those chapters, except for chapter 5 (Introduction to machine learning) was somehow hard for me to follow because the concept of machine learning was new to me and there was too much code in the chapter that the author assumed you might know so there was no explanation, but someone with a bit knowledge on python would follow it very easily.
Profile Image for Lukas Rubikas.
23 reviews5 followers
May 1, 2019
I'll just say this:
If I was put into this horrible scenario where I was held at a gunpoint next to a gigantic red button and was told that I must press it and nuke *every* single book publisher in the world bar one and I absolutely must choose which one, I would save O'Reilly. And I would use *this* book as an example to justify why.
Profile Image for Ravi.
151 reviews
September 5, 2019
Great resource with excellent examples and useful, well-written Python code. A lot of techniques are introduced here, with the unfortunate exception of neural networks/deep learning, which is beyond the scope of this book. The book is written using Jupyter notebooks and printed in black & white, so for some of the plots you'll have to refer to the online versions to better see what's going on.
Profile Image for George.
91 reviews3 followers
September 2, 2019
Really good. Starts from blank slate and goes to a good level to all the topics that it touches. The online version is more up to date and the complementary notebooks can be used to run all the examples yourself.
Profile Image for Alvaro Fuentes.
78 reviews3 followers
June 29, 2020
Excellent book for any one interested in understand the fundamentals of scientific computing for data science in Python. I can't recommend this book enough, if you are interested in data science, read it from beginning to end.
153 reviews1 follower
July 4, 2021
Basic information about using Python to clean, manipulate and analyze data frame in Python.

I liked it. My only complain is the black-and-white printing of figures and sometimes they are in low resolution and I cannot read the figures well.
Profile Image for Edward.
22 reviews
August 11, 2021
Relatively easy reading one, which is very suitable for beginners who have already known some Python programming knowledge, also fit for certain query at any time,though some of its ideas had already been out of date due to the update of Python.

PS: 这本书的里的代码都是开源的(爆赞),你可以在作者的Github上找到所有相关资料
Profile Image for Swapnil.
45 reviews4 followers
May 10, 2023
This book is useful for someone jumping into a quantitative role and needs to understand the basics of data analysis using Python. I will refer it to when I am facing some problem and it will lead to a solution. It provides background and intuition which I prefer rather than only having a solution.
Displaying 1 - 30 of 63 reviews

Can't find what you're looking for?

Get help and learn more about the design.