What do you think?
Rate this book
262 pages, Paperback
First published January 1, 2012
When I saw this book I was hooked. As I work in Business Intelligence on a product that weeds out bad data I figured I was slap bang in the target market. I was right. While a lot of emphasis is put on the data scientist role within the book it has clear tones of business intelligence in it. Not to mention some discussion specifically mentions data warehouses, reporting and the like.
The book is really a collection of essays on a spectrum from non-technical through to technical. Earlier essays are much more practically oriented than later essays. As such it suffers from the classic issues of an essay collection. Namely there's no one voice in the book (which weakens it's overall authority) and that the whole is dragged down by it's weakest elements. It's fair to say that O'Reilly have really pushed towards the R, Hadoop, big/unstructured data crowd and that shows in the maturity of the commentary which is sometimes a little breathless. Having said that the book is refreshingly clear of the stodginess associated with much of the BI canon.
Some high points of the book for me were the essays Blood, Sweat and Urine, Bad Data Lurking in Plain Text and Detecting Liars and the Confused in Contradictory Online Reviews. The Blood, Sweat and Urine essay particularly was both human and useful. Bad Data Lurking in Plain Text mirrored my pain in dealing with plain text and was a very concise overview of a troubling area. Finally Detecting Liars and the Confused in Contradictory Online Reviews was an excellent experience report. An honourable mention should go to Data Quality Analysis Demystified: Knowing When Your Data Is Good Enough which was a good way to close the book.
However the lows were pretty low and knocked at least one star off the review. Social Media, Erasable Ink? Was a hyperbolic essay whose point seemed overly laboured. Spoiler, your public data sops being yours after you transmit it to the service to publicise it. Also Don't Let the Perfect Be the Enemy of the Good: Is Bad Data Really Bad was vague and wish-washy in the extreme.
Overall, check it out if you work with data, you'll spend a lot of time nodding your head at least. The practical chapters trump the governance chapters in my opinion but that just reflects the relative maturity of the communities represented in the book. Sadly though this book won't be a classic of the field in it's current form. I hope O'Reilly knock out a second edition without some of the poorer chapters, as it would be nice to see this book evolve.