Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design,
This is the single best reference guide to Hadoop and related projects, and it's the only O'Reilly book I have read cover to cover.
Here is the way I recommend reading it: Read through the first two chapters including the tutorial walk through with the weather examples, then jump ahead and read the introduction for each of the related projects Pig (chapter 11), Hive (12), HBase (13), Zookeeper (14), Sqoop (15). Then read the case studies in the last chapter. Then go back and read about Hadoop in detail. I read chapter 6 (How MapReduce Works) several times and found it very helpful in optimizing MR jobs.
Very highly recommended. Be sure to get the latest edition, which is 2nd. I think a 3rd edition is coming out around summer.
You are practically guaranteed a few million dollars from a VC if you can write "big data" in the snow with your pee, so you might as well start learning about this stuff now.
i got really interested about Hadoop, that is why i started reading this book :), there are only 3 books about Hadoop, and from reviews i read looks like this one is the best.
This book didn't age very well, most of the technologies described in the book are obsolete already.
I needed to quickly learn Hadoop fundamentals:HDFS and HBase. As a classically trained engineer I went for the book from the reputable issuer with most positive reviews.
Although the chapters on Hadoop architecture, HBase, Avro and HBase were very helpful to get me started with all technologies I needed, the rest of the book is unbelievably outdated.
Multiple pages of Java code listings, guides to install lots of applications (depending on what OS you're using) by hand.
No one does that anymore due to 2 reasons:
Docker helped me to install Hadoop on my Windows laptop that's running Debian Code suitable for the latest Hadoop build is available on Github
Even though I truly believe that I was able to learn substantially more from the book than what I needed, I found most topics covered pretty well on a dated Udemy course. However, I spent hours on Udemy and days to finish the book.
Обяснява някои концепции на NoSQL, както и идеологията на Hadoop, като на места навлиза в детайли отвъд моите интереси.
Ако я сравня с "Hadoop Pro", която въобще не ми беше полезна, тази трябва да има 5 звездички. Въпреки това има известни пропуски, примерно няма информация за Hive, а тази за HBase e ограничена.
За целите на дипломната ми работа и първоначално запознаване с технологията, книгата е повече от достатъчна, а и още няма алтернатива.
Very good book, that allows to get high level overview of Hadoop, and related projects, together with description of other Hadoop-related projects - Pig, HBase, and other. I'll recommend this book for all developers, who want to learn about Hadoop, it's usage and programming for it
Tom White is an excellent technical writer, paying close attention to accuracy, clarity, and completeness. this tome has all the features you might expect in a "Definitive Guide". no topic related to Hadoop is considered so inconsequential as to skim over, or fall outside the scope. Probably the best way to get a deep and broad understanding of Hadoop is to read this book. You will come away with a strong understanding of the methods, philosophy, and design of all things Hadoop.
yes, you may never install Hadoop from scratch, and you will probably never write a MapReduce job by hand. but working through these chapters will give you an understanding of Hadoop that surpasses that of the typical Big Data engineer. it's worth the read.
The only downside to this book is that it's a little dated, having been published in 2015. (I'm reading the fourth and latest edition.) Because of this, some of the "Related Projects" chapters are of little practical value, eg, Pig, Crunch. It would do well to replace these chapters with write-ups of more modern projects such as Impala and Drill.
Personal note / full disclosure: I skipped and don't intend to read the chapters on Pig and Crunch, or the three case studies.
Honestly, this book should be the Hadoop manual. If you've ever downloaded stock Hadoop and glanced through the included manual, you'll have found it to be minimal. This book walks you through setting up a development environment for Hadoop, explains the basic concepts behind it and its implementation, then overviews setting up a Hadoop cluster (leaving the details to other books on Hadoop operations), overviews the Hadoop ecosystem and concludes with a few case studies.
If you are interested in Hadoop and not yet familiar with it, this book is a great place to start.
Extremely thorough and comprehensive, but slightly outdated and non-reproducible.
It will drastically improve your understanding of Hadoop and Distributed Computing, but you should have some [Unix] development and operations skills to reproduce the installation steps and examples.
IMO, you shouldn't read it, if you aren't planning to get your hands dirty and setup at least pseudo-distributed environment, because the book is "too big and noisy" to hold your focus on key points and practical results -- you'd end up with pure theory spilling from your ears.
This is a quite amazing book having a comprehensive content on the Hadoop eco-system. The rich code examples coming with the book really help me understand how MapReduce works. It also covers all the other major sub systems like Hive, HBase, Spark, etc. so you can get a feel how they work. Although you might need separate books to delve deep into these subjects. The case studies at the end of the books are also a joy to read.
So this is rather lengthy, but nevertheless interesting book which explains in detail how Hadoop internally works and also gives you brief introductions to other systems build on-top of it.
Although I enjoyed reading e.g. what sqoop actually does, but I wonder if each of the other systems really need so many lines written here.
This book offers a good coverage of the Apache Hadoop ecosystem. It explored Hadoop in detail and also touched a few higher level software components such as Spark and Hive. A good intro to Hadoop.
The book opens the door to Hadoop world and guides you to major places such as HDFS, Map Reduce, Hive, Pig, ZooKeeper, HBase, Sqoop. Not only gives a first impression of what Hadoop, it also gives a deeper knowledge about each component and related technologies. Thus, if you just want a book to rule them all, pick this one.
However, because the ambition of the author is to put all into one book, you might feel overwhelmed with many details under the hood. It should be better you just read the introduction of a technology such as what it is, how it works rather than unraveling everything in this introductory book. Of course you might find it fascinating someway. Say, I was really interested in Map Reduce, HDFS and Pig and really enjoyed, but got bored with Hive and Squoop with many details. I'd rather pick other books that exclusively written for that technology.
There is also a disappointment about ZooKeeper. Actually, I excitingly read this part first for my assignment with distributed computing, yet it didn't help much. Neither it mentions streaming technologies like Storm, maybe the next version will include.
However the book was written so well with many deep understandings about the platform. It is definitely worth your time to read.
Definitely a good way to start, I'd recommend the latest version as many blocks are not being used anymore, however if you really want to understand the underlying engine, this is the book to start with, Map Reduce is a complex Model that probably you'll never tweak, however, it is very important to completely understand how this model works so that you can optimize a cluster, and if you want probably come up with a new data processing technology (i.e. there are some tools that work on top of map reduce like spark and pig). The installation guide is not pretty straightforward but you can definitely come up with your own cluster, there are many distributions out there (CDH, Hortonworks, HDinsight) so you don't have to start from scratch. It'd be nice to include some of Azure, AWS and other cloud services.
A good introductory book for anyone who not only wants to understand the principles of Hadoop and related systems, but also for someone who wants to go one level deeper.
The best part about the book is that it not only explains the concepts well, but also gives practical examples and code snippets to follow through. Although, I personally didn't replicate all the examples, it was valuable to see how the code functioned.
If I were to dock points, it would only be because certain systems aren't really explained in understandable terms (Google is your best friend then). In some chapters, I could see the author reduced effort as chapter content was simply replicated from the open source description or documentation pages.
This is a great overview of the various tools/technologies that make up the Hadoop ecosystem. Each chapter that covers a different tool/technology is a good overview of each. Each area is quickly finding a slew of of books on each individually, but I still find this is a good place to start. With a fourth edition coming soon (available in pre-release online), it's nice to see that they're trying to keep this up to date as the technology changes.
This is best Hadoop book. Brief introduction of all related tools e.g. Hive/Pig/HBase/ZooKeeper/Sqoop 1. Initial 10 chapters are devoted for Hadoop. 2. Writing Map/Reduce programs using the given online reference is enough; this books is just good to understand the internals of these operations. 3. Best is to start referring Apache Hadoop developer reference along with Hadoop stand alone setup. 4. Book is helpful to get more deeper into the Hadoop Logic.
The layout is confusing and non-intuitive. The writing often omits important points. And there is much space given over to specific technologies and not to general Hadoop understanding and programming.
Good book on basics of Hadoop (HDFS, MapReduce & other related technologies). This book provides all necessary details to start work with Hadoop, program using it, administer, etc.
I actually read 1st edition as well, but I found many new & useful additions in new edition
Pretty good summary. Hadoop and it's ecosystem are incredibly complex. I'd be terrified to deploy it without reading this book first. I guess I'm still pretty terrified, but markedly less so.
Some of the writing was a bit wonky, but overall really good.
I relied heavily on this text to prepare my Hadoop talk for the Boulder Java User's Group. The examples and explanation for the myriad parts of Hadoop were clear and concise.
Highly recommend the book if you want to get up to speed on Hadoop.
Best book to dive into Hadoop world. Of course hadoop API evolves pretty fast, but I was able with minor changes to launch most of code samples. Very handy, especially provides guidance to use local/dev mode to start immediate implementation of M/R stuff
Świetna jako przegląd technoligii związanych z Hadoopem, wyjątkowo mizerna jako źródło przykładów kodu i zastosowań (bo "200 sposobów na wyliczenie temperatury maksymalnej" to nie jest to czego oczekiwałem).
This book is really fantastic! It's a complete reference on Hadoop ecosystem, and should be first point of contact for the person playing with Hadoop. Content and writing style is really approachable -- I wish that other technical authors are able to write on the same level as Tom White does.
Flawless from a technical point of view. Didn't give more stars because I've found more inspirational material, but this one gets the job done on what is meant to do: giving you the technical details of the Hadoop ecosystem. Not sure, however, if it will stand the test of time
I found this book more helpful and detailed than the Hadoop in Action book I had read earlier. It was better at explaining the setup and the purpose of the various Hadoop services and config files.