Rate this book

Hadoop: The Definitive Guide

Name: Hadoop: The Definitive Guide
Rating: 3.93 (58 reviews)
ISBN: 9780596521974

Tom White

Rate this book

Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design,

GenresProgrammingTechnologyTechnicalComputer ScienceNonfictionReferenceSoftware

528 pages, Paperback

First published May 1, 2009

369 people are currently reading

1521 people want to read

About the author

Tom White

5 books20 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

277 (27%)

4 stars

463 (45%)

3 stars

215 (21%)

2 stars

40 (3%)

1 star

16 (1%)

Displaying 1 - 30 of 58 reviews

Todd N

357 reviews256 followers

April 24, 2012

This is the single best reference guide to Hadoop and related projects, and it's the only O'Reilly book I have read cover to cover.

Here is the way I recommend reading it: Read through the first two chapters including the tutorial walk through with the weather examples, then jump ahead and read the introduction for each of the related projects Pig (chapter 11), Hive (12), HBase (13), Zookeeper (14), Sqoop (15). Then read the case studies in the last chapter. Then go back and read about Hadoop in detail. I read chapter 6 (How MapReduce Works) several times and found it very helpful in optimizing MR jobs.

Very highly recommended. Be sure to get the latest edition, which is 2nd. I think a 3rd edition is coming out around summer.

You are practically guaranteed a few million dollars from a VC if you can write "big data" in the snow with your pee, so you might as well start learning about this stuff now.

big-data kindle

Ahmed Attyah

23 reviews5 followers

Want to read

May 28, 2012

i got really interested about Hadoop, that is why i started reading this book :), there are only 3 books about Hadoop, and from reviews i read looks like this one is the best.

programming

Michael Koltsov

110 reviews69 followers

February 20, 2025

This book didn't age very well, most of the technologies described in the book are obsolete already.

I needed to quickly learn Hadoop fundamentals:HDFS and HBase. As a classically trained engineer I went for the book from the reputable issuer with most positive reviews.

Although the chapters on Hadoop architecture, HBase, Avro and HBase were very helpful to get me started with all technologies I needed, the rest of the book is unbelievably outdated.

Multiple pages of Java code listings, guides to install lots of applications (depending on what OS you're using) by hand.

No one does that anymore due to 2 reasons:

Docker helped me to install Hadoop on my Windows laptop that's running Debian
Code suitable for the latest Hadoop build is available on Github

Even though I truly believe that I was able to learn substantially more from the book than what I needed, I found most topics covered pretty well on a dated Udemy course. However, I spent hours on Udemy and days to finish the book.

My score 1/5

Veselin Nikolov

731 reviews87 followers

August 16, 2010

Обяснява някои концепции на NoSQL, както и идеологията на Hadoop, като на места навлиза в детайли отвъд моите интереси.

Ако я сравня с "Hadoop Pro", която въобще не ми беше полезна, тази трябва да има 5 звездички. Въпреки това има известни пропуски, примерно няма информация за Hive, а тази за HBase e ограничена.

За целите на дипломната ми работа и първоначално запознаване с технологията, книгата е повече от достатъчна, а и още няма алтернатива.

Alex Ott

Author 3 books207 followers

July 4, 2010

Very good book, that allows to get high level overview of Hadoop, and related projects, together with description of other Hadoop-related projects - Pig, HBase, and other.
I'll recommend this book for all developers, who want to learn about Hadoop, it's usage and programming for it

ir-dm-nlp-ml-search programming

Ieva

15 reviews4 followers

September 1, 2017

Good indroductory book. Some newer information is missing.

big-data

LIUF

30 reviews2 followers

November 3, 2019

The book required to read in the first two weeks after I began my new position as a big data engineer.

I finished only chapter 1-6, as I use pyspark most of the time.

programming

John

221 reviews12 followers

December 12, 2018

Tom White is an excellent technical writer, paying close attention to accuracy, clarity, and completeness. this tome has all the features you might expect in a "Definitive Guide". no topic related to Hadoop is considered so inconsequential as to skim over, or fall outside the scope. Probably the best way to get a deep and broad understanding of Hadoop is to read this book. You will come away with a strong understanding of the methods, philosophy, and design of all things Hadoop.

yes, you may never install Hadoop from scratch, and you will probably never write a MapReduce job by hand. but working through these chapters will give you an understanding of Hadoop that surpasses that of the typical Big Data engineer. it's worth the read.

The only downside to this book is that it's a little dated, having been published in 2015. (I'm reading the fourth and latest edition.) Because of this, some of the "Related Projects" chapters are of little practical value, eg, Pig, Crunch. It would do well to replace these chapters with write-ups of more modern projects such as Impala and Drill.

Personal note / full disclosure: I skipped and don't intend to read the chapters on Pig and Crunch, or the three case studies.

software

Michael

85 reviews17 followers

March 25, 2018

Honestly, this book should be the Hadoop manual. If you've ever downloaded stock Hadoop and glanced through the included manual, you'll have found it to be minimal. This book walks you through setting up a development environment for Hadoop, explains the basic concepts behind it and its implementation, then overviews setting up a Hadoop cluster (leaving the details to other books on Hadoop operations), overviews the Hadoop ecosystem and concludes with a few case studies.

If you are interested in Hadoop and not yet familiar with it, this book is a great place to start.

Maxim

33 reviews1 follower

June 4, 2020

Extremely thorough and comprehensive, but slightly outdated and non-reproducible.

It will drastically improve your understanding of Hadoop and Distributed Computing, but you should have some [Unix] development and operations skills to reproduce the installation steps and examples.

IMO, you shouldn't read it, if you aren't planning to get your hands dirty and setup at least pseudo-distributed environment, because the book is "too big and noisy" to hold your focus on key points and practical results -- you'd end up with pure theory spilling from your ears.

data-processing

Yong Lai

88 reviews

February 4, 2018

This is a quite amazing book having a comprehensive content on the Hadoop eco-system. The rich code examples coming with the book really help me understand how MapReduce works. It also covers all the other major sub systems like Hive, HBase, Spark, etc. so you can get a feel how they work. Although you might need separate books to delve deep into these subjects. The case studies at the end of the books are also a joy to read.

Christoph Kappel

463 reviews9 followers

September 18, 2023

So this is rather lengthy, but nevertheless interesting book which explains in detail how Hadoop internally works and also gives you brief introductions to other systems build on-top of it.

Although I enjoyed reading e.g. what sqoop actually does, but I wonder if each of the other systems really need so many lines written here.

2023 big-data english

Rufeng

3 reviews1 follower

January 28, 2018

Wish it could be written concisely. My favourite chapters are How MapReduce Works, HBase, Zookeeper and Case Studies (Facebook's Hive).

Senthil Kumra

3 reviews

August 5, 2018

Great book to get started with hadoop ecosystem. Covers most of the parts

Zhi Han

74 reviews13 followers

May 26, 2022

This book offers a good coverage of the Apache Hadoop ecosystem. It explored Hadoop in detail and also touched a few higher level software components such as Spark and Hive. A good intro to Hadoop.

programming tech

Ha Truong

61 reviews54 followers

January 17, 2015

The book opens the door to Hadoop world and guides you to major places such as HDFS, Map Reduce, Hive, Pig, ZooKeeper, HBase, Sqoop. Not only gives a first impression of what Hadoop, it also gives a deeper knowledge about each component and related technologies. Thus, if you just want a book to rule them all, pick this one.

However, because the ambition of the author is to put all into one book, you might feel overwhelmed with many details under the hood. It should be better you just read the introduction of a technology such as what it is, how it works rather than unraveling everything in this introductory book. Of course you might find it fascinating someway. Say, I was really interested in Map Reduce, HDFS and Pig and really enjoyed, but got bored with Hive and Squoop with many details. I'd rather pick other books that exclusively written for that technology.

There is also a disappointment about ZooKeeper. Actually, I excitingly read this part first for my assignment with distributed computing, yet it didn't help much. Neither it mentions streaming technologies like Storm, maybe the next version will include.

However the book was written so well with many deep understandings about the platform. It is definitely worth your time to read.

I put 5 stars.

technology

Saul Cruz

10 reviews2 followers

June 21, 2016

Definitely a good way to start, I'd recommend the latest version as many blocks are not being used anymore, however if you really want to understand the underlying engine, this is the book to start with, Map Reduce is a complex Model that probably you'll never tweak, however, it is very important to completely understand how this model works so that you can optimize a cluster, and if you want probably come up with a new data processing technology (i.e. there are some tools that work on top of map reduce like spark and pig). The installation guide is not pretty straightforward but you can definitely come up with your own cluster, there are many distributions out there (CDH, Hortonworks, HDinsight) so you don't have to start from scratch. It'd be nice to include some of Azure, AWS and other cloud services.

Ketan Nayak

10 reviews2 followers

March 10, 2017

A good introductory book for anyone who not only wants to understand the principles of Hadoop and related systems, but also for someone who wants to go one level deeper.

The best part about the book is that it not only explains the concepts well, but also gives practical examples and code snippets to follow through. Although, I personally didn't replicate all the examples, it was valuable to see how the code functioned.

If I were to dock points, it would only be because certain systems aren't really explained in understandable terms (Google is your best friend then). In some chapters, I could see the author reduced effort as chapter content was simply replicated from the open source description or documentation pages.

2017 tech

Sam

521 reviews

April 2, 2015

This is a great overview of the various tools/technologies that make up the Hadoop ecosystem. Each chapter that covers a different tool/technology is a good overview of each. Each area is quickly finding a slew of of books on each individually, but I still find this is a good place to start. With a fourth edition coming soon (available in pre-release online), it's nice to see that they're trying to keep this up to date as the technology changes.

data safari tech

ᴀᴍɪᴛ

38 reviews5 followers

January 27, 2016

This is best Hadoop book. Brief introduction of all related tools e.g. Hive/Pig/HBase/ZooKeeper/Sqoop
1. Initial 10 chapters are devoted for Hadoop.
2. Writing Map/Reduce programs using the given online reference is enough; this books is just good to understand the internals of these operations.
3. Best is to start referring Apache Hadoop developer reference along with Hadoop stand alone setup.
4. Book is helpful to get more deeper into the Hadoop Logic.

computers

Courtney

236 reviews

October 25, 2009

The layout is confusing and non-intuitive. The writing often omits important points. And there is much space given over to specific technologies and not to general Hadoop understanding and programming.

computers non-fiction programming

Alex Ott

Author 3 books207 followers

August 26, 2017

Good book on basics of Hadoop (HDFS, MapReduce & other related technologies). This book provides all necessary details to start work with Hadoop, program using it, administer, etc.

I actually read 1st edition as well, but I found many new & useful additions in new edition

ir-dm-nlp-ml-search own-ebook programming

Michael Economy

197 reviews286 followers

August 20, 2012

Pretty good summary. Hadoop and it's ecosystem are incredibly complex. I'd be terrified to deploy it without reading this book first. I guess I'm still pretty terrified, but markedly less so.

Some of the writing was a bit wonky, but overall really good.

work-related

Joe Mctee

17 reviews1 follower

October 7, 2013

I relied heavily on this text to prepare my Hadoop talk for the Boulder Java User's Group. The examples and explanation for the myriad parts of Hadoop were clear and concise.

Highly recommend the book if you want to get up to speed on Hadoop.

tech

Anatoly Kaverin

72 reviews9 followers

November 27, 2014

Best book to dive into Hadoop world.
Of course hadoop API evolves pretty fast, but I was able with minor changes to launch most of code samples.
Very handy, especially provides guidance to use local/dev mode to start immediate implementation of M/R stuff

Dariusz

197 reviews

January 2, 2016

Świetna jako przegląd technoligii związanych z Hadoopem, wyjątkowo mizerna jako źródło przykładów kodu i zastosowań (bo "200 sposobów na wyliczenie temperatury maksymalnej" to nie jest to czego oczekiwałem).

informatyka owned

Manzur

28 reviews1 follower

June 18, 2016

This book is really fantastic! It's a complete reference on Hadoop ecosystem, and should be first point of contact for the person playing with Hadoop. Content and writing style is really approachable -- I wish that other technical authors are able to write on the same level as Tom White does.

programming

Luis

54 reviews1 follower

March 9, 2020

Flawless from a technical point of view. Didn't give more stars because I've found more inspirational material, but this one gets the job done on what is meant to do: giving you the technical details of the Hadoop ecosystem. Not sure, however, if it will stand the test of time

Michael David Cobb

255 reviews7 followers

Read

May 17, 2016

meh

bored

Paul Childs

183 reviews3 followers

September 6, 2011

I found this book more helpful and detailed than the Hadoop in Action book I had read earlier. It was better at explaining the setup and the purpose of the various Hadoop services and config files.

computers

Displaying 1 - 30 of 58 reviews

More reviews and ratings