Jump to ratings and reviews
Rate this book

Programming Hive: Data Warehouse and Query Language for Hadoop

Rate this book
Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect—HiveQL—to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.

This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.

Use Hive to create, alter, and drop databases, tables, views, functions, and indexesCustomize data formats and storage options, from files to external databasesLoad and extract data from tables—and use queries, grouping, filtering, joining, and other conventional query methodsGain best practices for creating user defined functions (UDFs)Learn Hive patterns you should use and anti-patterns you should avoidIntegrate Hive with other data processing programsUse storage handlers for NoSQL databases and other datastoresLearn the pros and cons of running Hive on Amazon’s Elastic MapReduce

586 pages, Kindle Edition

First published January 1, 2012

39 people are currently reading
99 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
16 (17%)
4 stars
41 (44%)
3 stars
28 (30%)
2 stars
6 (6%)
1 star
2 (2%)
Displaying 1 - 8 of 8 reviews
2 reviews
December 1, 2021
This is a rushed book with bunch of useless information and copy paste from hive wiki.
Crippling outdated.
I'm still trying to finish it though.
Profile Image for Rick.
22 reviews
January 11, 2013
This could have been a much better book had it not been for the apparent haste with which O'Reilly rushed it out the door before (really) doing a final edit. The book is riddled with typographical errors, my favorite being the "dangling" second paragraph of Chapter 17, "Storage Handlers and NoSQL", which ends with: "For example, a Hive query could be run that selects a data table that is backed by sequence files, however it could output" (no kidding).

The overall content is worthwhile, but you have been forewarned, it's not as well edited as other books from O'Reilly. Three stars, solely by content.
6 reviews
August 23, 2016
Really good book to get into Hive and dive deeper. The installation is somewhat outdated but mind you, this book is a few years old. And I'm on mac, which I think is still not officially supported. Trying to build something with hive is filled with uncertainty as I am never 100% sure if it fails because I'm not on Linux or because my queries are wrong.
But still, great book to get into Hive. Can't wait for the second edition coming out early 2017.
Profile Image for Karl.
221 reviews26 followers
October 16, 2014
Maybe 2.5 stars? Not as clear as other O'Reilly texts, and with a ton of mistakes, both in text and code snippets. Clearly a rush job. Still, it'll get you going in terms of being a *user* of Hive. If you want to be an administrator, I'd look to other sources - and make sure you have a solid Java background.
Displaying 1 - 8 of 8 reviews

Can't find what you're looking for?

Get help and learn more about the design.