Rate this book

Data Pipelines Pocket Reference: Moving and Processing Data for Analytics

Name: Data Pipelines Pocket Reference: Moving and Processing Data for Analytics
Rating: 3.77 (31 reviews)
ISBN: 9781492087830

James Densmore

Rate this book

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll

GenresTechnologyProgrammingComputer ScienceNonfictionReferenceTechnicalSoftware

274 pages, Paperback

Published March 16, 2021

157 people are currently reading

350 people want to read

About the author

James Densmore

2 books1 follower

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

50 (21%)

4 stars

100 (43%)

3 stars

60 (26%)

2 stars

17 (7%)

1 star

3 (1%)

Displaying 1 - 30 of 31 reviews

Sebastian Gebski

1,188 reviews1,341 followers

March 14, 2021

I don't think that 'Pocket Reference' is the proper way to describe this book.
An example? Sample path? One of the ways to do it? A representative case?
A bit of theory, some SQL, basic introduction to how to structure processing pipeline - that's what you can get out of this book.
It's probably OK if you want to figure out what actually powers (under the hood) modern data processing pipelines, but I wouldn't say it's useful if you want to set a solid foundation for a more thorough research.

Mikhail

1 review

March 10, 2021

I read this book to get up to speed with modern software data engineering. I think I achived the goal, although I finished with a knowledge of how much I do not know, rather than with the confidence in building the solutions myself.
James seems to take an opinionated approach by using cloud warehouse databases (Redshift and Snowflake). The use cases and computations are well suited to them, and I would need to read other recourses to see how the patterns mentioned play with other technologies. The price/ops complexity of possible stacks is not mentioned.
The chapters with SQL examples look great. I learned a bunch there.
There are also enough mentions of various technologies and books throughout the book -- I learned about Kimball modeling, dbt, Airflow, Atlas...
It would be great to extend the reasoning about production and operations, pitfalls and risks -- such as schema migration, scaling, schema registry, deployment, versioning, durability risks, retention, backups, recomputing... Validation, metrics collection, and slack notifications are presented and I would like to hear more about some visualization.
Overall it is a good book and I only wish every chapter of it would be bigger. Oh wait, there is "Pocket" in the name. Nevermind, then.

Liz

23 reviews2 followers

March 1, 2021

Some technologies are already a bit outdated, but the book serves as a good overview of the whole ETL/ELT processes.

data-engineering

Ossian Hempel

58 reviews

July 14, 2024

Great overview of the ETL process, with examples. Doesn’t touch streaming but outside of that I have no complaints. Gets to the point and doesn’t delve TOO deep into the details.

Sean

63 reviews18 followers

October 12, 2022

I skimmed this and found a good introduction to a number of issues and perspectives, though as a "pocket reference" it's quite brief. Beginners like me will find some helpful ideas and opinions: those with more background aren't likely to find much. It's good for what it is, but don't expect a lot more.

_libby learn_data-science

Scott Haines

20 reviews3 followers

March 17, 2021

This is a goood book for data engineers looking to work with CDC ETL & EtLT data moving systems. It covers a little orchestration management with Airflow too. Nice book for the desk of anyone working in the data industry.

Adél

1 review

December 11, 2024

Data Pipelines Pocket Reference by James Densmore is a handy guide for data engineers who already have some foundational knowledge and are looking for quick insights or inspiration. It’s not a beginner’s guide or a deep dive, but it does a good job of covering the essentials of designing and managing data pipelines in a concise format.

The examples included are practical, but they’re fairly basic, so if you’re looking for more in-depth explanations or complex use cases, this might not fully meet your needs. That said, it’s a great resource for brushing up on concepts or getting ideas for tackling specific pipeline-related challenges. Overall, a solid reference book, just as the title suggests.

career

Florent

19 reviews2 followers

June 30, 2021

The data engineering industry suffers from a lack of good books. This one is very practical and ELT-focused. It complements well theoretical books like Kimball's and Designing Data-Intensive Applications.
It still far from being perfect though. Some parts are already outdated. The focus on ELT, without extensive discussion of its tradeoffs, is highly questionable.

Casper Weiss Bang

44 reviews2 followers

November 14, 2022

Good read. Not great. For me it was a bit specific in code for me(i know how to write code, and don't use airflow which they did), luckily i could skip those parts. It was a good quick read through, covering key terms and principles. It could probably have been shorter and maybe more theoretical, but it was worth it for the parts that were good. Can recommend as an introduction to data pipelines. Not sure if it's a good reference piece really. I'd probably just use a search engine + relevant documentation

An Te

386 reviews26 followers

May 5, 2022

A very good primer on data processing and pipelines with code to consider the key elements involved in data pipelines for validation and iteration.

I agree with other reviewers that it is not a manual but more an survey of the field. It’s helpful certainly for a data scientist to know the concepts about. Much more detail and depth is needed if you’re looking for a standalone data engineering book 📖

Lee

15 reviews2 followers

August 4, 2023

Not a good book as an introduction to Data Engineering but rather a reference to do specific Data Engineering tasks or workflows. Did learn some things but began skimming about halfway through.

Other complaints:
- Instructions aren't clear and had difficulty setting up environment to even complete the exercises
- No conclusion highlighting the main takeaways from book

May try Fundamentals of Data Engineering next to see if that is any better and more suited to what I am looking for.

Jaivarsan B

28 reviews1 follower

March 30, 2023

Hands on practical content for beginners. Got some good basics/prototype ready stuff out of it in production.

At the same time limits itself very much only to these basics, there could have been a chapter on Distributed Computing. Or using async patterns to cover more volume and variants of DE in real life.

Lucille Nguyen

417 reviews12 followers

August 28, 2025

It's a quick intro and reference on how to create and maintain data pipelines. A lot of examples in here are pretty dated. Does a good job at introducing the basics. Look, it's a O'Reilly pocket reference what else can you expect. Could have probably done more to speak more generally so that it doesn't just seem like a few articles on dbt, Airflow, Hadoop but hey, it's good for what it is.

Mi Lia

39 reviews6 followers

November 27, 2021

Very well written small book. The only reason I've put 4/5 and not 5/5 was that...for some weird reason I hadn't realized that the book would be such short. So it was a small disappointment. But surely the content of the book was worthwhile, if only there was more...

George Touros

152 reviews10 followers

February 27, 2022

I found this book practical, concise and to-the-point. It's just a starting point really, but I'd like to know more from the people here saying that it's already outdated. From my point of view, the book was worth it.

non-fiction

Wojciech Pierwoła

26 reviews

October 17, 2022

Takie 6+/10, kilka ciekawych podrozdziałów, ale, nawet jak na rozmiary książki, to podejście jest mega wąskie + z 1/3 to snippety kodu, które można znaleźć w każdym tutorialu/dokumentacji/wątku na stacku

Siddharth Yadav

4 reviews

February 11, 2023

My favourite book for data engineers

Easy to read with great code examples. Really liked the less verbose and more practical approach. Highly recommended for anyone looking to pursue data engineering.

Suhas Krishna

1 review

March 14, 2024

Great intro book

This book provides a great introduction to the various steps associated with building data pipelines. If you’re new to data engineering or want to underatand the high level steps associated with building data pipelines, this book acts as a great reference point

Yi Zheng

1 review

March 14, 2021

Practical

Good book, very practical. Teach you how set and manage a pipeline entirely. * * * * * * *

Diaa Kasem

2 reviews

January 7, 2022

Nice Introduction to many data pipeline technologies, and to the point... Good book to get started as a data engineer, given you are already a senior developer.

Reza

45 reviews17 followers

Read

February 18, 2022

Good for starters.

Azis Adi Kuncoro

13 reviews6 followers

March 31, 2022

Good for beginners in data engineering. Well summarized concept and terminologies.

software-data-engineering

Harry

15 reviews

December 29, 2022

Good for people that want to grasp the basic concepts of data engineering before trying to jump into it as a profession. Entry level book.

Norbert

35 reviews

December 4, 2023

I was really disappointed. Very detailed and special knowledge, even with code examples. Just few conceptual ideas…

Marco Gemaque

67 reviews

January 24, 2025

Need to read it again

Kris

54 reviews2 followers

June 17, 2025

Great overview on the whole data engineering framework and best practice

Denis

19 reviews

September 5, 2025

This is probably the best introductory book on data engineering. It provides a lot of intuition behind the concepts.

Jonathan

1 review

July 10, 2023

Detailed examples given, primarily focused on Apache Airflow and Python. Clear explanations of each step. Would have benefited from a few examples of use cases across different industries. Overall great practical introduction to data pipelines.

Evan Oman

31 reviews2 followers

December 6, 2022

Nice, quick overview of ELT pipelines focusing on SQL and Airflow. It covers a pretty narrow slice of the data engineering world but was still a useful read.