Jump to ratings and reviews
Rate this book

Agile Data Science: Building Data Analytics Applications with Hadoop

Rate this book
Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps.

175 pages, Paperback

First published December 22, 2012

12 people are currently reading
234 people want to read

About the author

Russell Jurney

6 books3 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
12 (16%)
4 stars
16 (21%)
3 stars
31 (42%)
2 stars
12 (16%)
1 star
2 (2%)
Displaying 1 - 10 of 10 reviews
Profile Image for Jonas.
17 reviews2 followers
June 28, 2016
Agile Data Science sets out to explain how to apply agile methodology in the field of data science. I would have liked more information on team formation and work processes, which the book covers pretty briefly. Instead the author focuses more on the tools (some of which are pretty dated at the time of reading) and one illustrative example application. Nevertheless, I find the book worth skimming through.
Profile Image for Lolo.
191 reviews1 follower
July 10, 2017
A good book for data science, but why put the "Agile" in the title if you're not gonna focus on this aspect?!

The majority of the book is about data science tools. Tedious step-by-step guides. So if you want to learn about data science tools it's a good book. If on the other hand you want to learn how to apply Agile methodologies to Data Science projects (like the title of the book implies) this is not the book for you.

The book was ok, but there are much better books (or video tutorials) about Data Science than this book.
Profile Image for André Gomes.
Author 5 books114 followers
April 2, 2014
Very nice introduction to data science with practical exemples and exercises.

It makes me think about how much unexplored knowledge is hidden in all these data our applications generate everyday.

The author uses many interesting tools such as apache pig, apache avro, mongo db, elastic search, wonder dog and flask.

Let's go deeper into it...
54 reviews1 follower
February 25, 2019
In my first read I only went through the first three chapters, which contained the general principles of the book, and a VERY interesting methodological framework for thinking about data science in general. Those principles (iterate, deliver intermediate products, etc.) have been extremely useful for my day to day at work (Banco de Bogotá currently). Particularly, the one about "scaling the pyramid of data value" is the absolute best one. The rest of the book is very much specific, showing how to design and application from start to finish using the whole modern Hadoop software stack. At some point I might come back to it, but for now the guiding principles is what I wanted (the specifics might not be as relevant depending on the software one ends up using. Currently it seems like at work is going to be Hortonworks as the Hadoop distribution, so I will probably focus on book where the details are aimed there).
Profile Image for Liamarcia Bifano.
8 reviews2 followers
December 3, 2018
It is good to have some general idea about how the technologies are used but it keeps with just one type of infra and doesn't make any comparison, pros and con with others available
Profile Image for Louis.
226 reviews30 followers
May 6, 2014
One of the problems with data science is that any description of what is encountered takes on the appearance of a mythical unicorn, noone person could possibly have all of the skills required. And it gets worse when you add to the standard set of statistics, domain knowledge, and programming the ability to deploy the application into a high speed environment. This book is not going to make a data scientist an expert in running a data center, but it is useful to give someone who has the rest of the skills an understanding of the environment their work will be deployed into.

One of the conflicts between the data scientist/analyst and information technology groups is that while the data scientist gives the data owned by the organization its value, IT is charged with storing the data and providing the access. And in a high velocity, high volume environment of big data, not understanding how the architecture works can lead to the data scientist creating valid solutions that cannot be applied in the actual day to day working environment. That is where this book comes in. The book has associated virtual machines in software repository so that the data scientist who does not know anything about infrastructure and the software stack that the data and the analysis rides on can see how everything fits together.

The book title is misleading. This is not a book about data analytics. This is a book for data analysts so they know how their analytical application is deployed and applied to day-to-day use in enterprise environments. For that reason it is useful.

Disclaimer: I received a free electronic copy of this book as part of the Oreilly Press Blogger program.
38 reviews3 followers
January 14, 2016
Typical techi book, not a lot of detail, lots of downloading instructions and a general frame-work on how to approach things. Not bad for 170 pages, but not really clear and seems to be tailor-made more for programmers.

If you want to tackle this book and get the most out of it, you should read up on Web Development, understand basic MVC style web architecture, know what Hadoop and MongoDB are on a higher-level, know how JSON works and have done some Python programming.
Profile Image for Upom.
229 reviews
March 20, 2014
Interesting book on how analytics applications can be developed quickly. A bit haphazardly written, but a lot of decent ideas for a budding data scientist to play around with
230 reviews3 followers
March 1, 2016
It is a very short book. However, the overall idea on doing date science within agile approach was clear. I found most of the chapters very brief.
Displaying 1 - 10 of 10 reviews

Can't find what you're looking for?

Get help and learn more about the design.