Jump to ratings and reviews
Rate this book

Data Pipelines Pocket Reference: Moving and Processing Data for Analytics

Rate this book
Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll

274 pages, Paperback

Published March 16, 2021

157 people are currently reading
350 people want to read

About the author

James Densmore

2 books1 follower

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
50 (21%)
4 stars
100 (43%)
3 stars
60 (26%)
2 stars
17 (7%)
1 star
3 (1%)
Displaying 1 - 30 of 31 reviews
Profile Image for Sebastian Gebski.
1,188 reviews1,341 followers
March 14, 2021
I don't think that 'Pocket Reference' is the proper way to describe this book.
An example? Sample path? One of the ways to do it? A representative case?
A bit of theory, some SQL, basic introduction to how to structure processing pipeline - that's what you can get out of this book.
It's probably OK if you want to figure out what actually powers (under the hood) modern data processing pipelines, but I wouldn't say it's useful if you want to set a solid foundation for a more thorough research.
1 review
March 10, 2021
I read this book to get up to speed with modern software data engineering. I think I achived the goal, although I finished with a knowledge of how much I do not know, rather than with the confidence in building the solutions myself.
James seems to take an opinionated approach by using cloud warehouse databases (Redshift and Snowflake). The use cases and computations are well suited to them, and I would need to read other recourses to see how the patterns mentioned play with other technologies. The price/ops complexity of possible stacks is not mentioned.
The chapters with SQL examples look great. I learned a bunch there.
There are also enough mentions of various technologies and books throughout the book -- I learned about Kimball modeling, dbt, Airflow, Atlas...
It would be great to extend the reasoning about production and operations, pitfalls and risks -- such as schema migration, scaling, schema registry, deployment, versioning, durability risks, retention, backups, recomputing... Validation, metrics collection, and slack notifications are presented and I would like to hear more about some visualization.
Overall it is a good book and I only wish every chapter of it would be bigger. Oh wait, there is "Pocket" in the name. Nevermind, then.
Profile Image for Liz.
23 reviews2 followers
March 1, 2021
Some technologies are already a bit outdated, but the book serves as a good overview of the whole ETL/ELT processes.
Profile Image for Ossian Hempel.
58 reviews
July 14, 2024
Great overview of the ETL process, with examples. Doesn’t touch streaming but outside of that I have no complaints. Gets to the point and doesn’t delve TOO deep into the details.
Profile Image for Sean.
63 reviews18 followers
October 12, 2022
I skimmed this and found a good introduction to a number of issues and perspectives, though as a "pocket reference" it's quite brief. Beginners like me will find some helpful ideas and opinions: those with more background aren't likely to find much. It's good for what it is, but don't expect a lot more.
Profile Image for Scott Haines.
20 reviews3 followers
March 17, 2021
This is a goood book for data engineers looking to work with CDC ETL & EtLT data moving systems. It covers a little orchestration management with Airflow too. Nice book for the desk of anyone working in the data industry.
1 review
December 11, 2024
Data Pipelines Pocket Reference by James Densmore is a handy guide for data engineers who already have some foundational knowledge and are looking for quick insights or inspiration. It’s not a beginner’s guide or a deep dive, but it does a good job of covering the essentials of designing and managing data pipelines in a concise format.

The examples included are practical, but they’re fairly basic, so if you’re looking for more in-depth explanations or complex use cases, this might not fully meet your needs. That said, it’s a great resource for brushing up on concepts or getting ideas for tackling specific pipeline-related challenges. Overall, a solid reference book, just as the title suggests.
Profile Image for Florent.
19 reviews2 followers
June 30, 2021
The data engineering industry suffers from a lack of good books. This one is very practical and ELT-focused. It complements well theoretical books like Kimball's and Designing Data-Intensive Applications.
It still far from being perfect though. Some parts are already outdated. The focus on ELT, without extensive discussion of its tradeoffs, is highly questionable.
Profile Image for Casper Weiss Bang.
44 reviews2 followers
November 14, 2022
Good read. Not great. For me it was a bit specific in code for me(i know how to write code, and don't use airflow which they did), luckily i could skip those parts. It was a good quick read through, covering key terms and principles. It could probably have been shorter and maybe more theoretical, but it was worth it for the parts that were good. Can recommend as an introduction to data pipelines. Not sure if it's a good reference piece really. I'd probably just use a search engine + relevant documentation
Profile Image for An Te.
386 reviews26 followers
May 5, 2022
A very good primer on data processing and pipelines with code to consider the key elements involved in data pipelines for validation and iteration.

I agree with other reviewers that it is not a manual but more an survey of the field. It’s helpful certainly for a data scientist to know the concepts about. Much more detail and depth is needed if you’re looking for a standalone data engineering book 📖

Profile Image for Lee.
15 reviews2 followers
August 4, 2023
Not a good book as an introduction to Data Engineering but rather a reference to do specific Data Engineering tasks or workflows. Did learn some things but began skimming about halfway through.

Other complaints:
- Instructions aren't clear and had difficulty setting up environment to even complete the exercises
- No conclusion highlighting the main takeaways from book

May try Fundamentals of Data Engineering next to see if that is any better and more suited to what I am looking for.
Profile Image for Jaivarsan B.
28 reviews1 follower
March 30, 2023
Hands on practical content for beginners. Got some good basics/prototype ready stuff out of it in production.

At the same time limits itself very much only to these basics, there could have been a chapter on Distributed Computing. Or using async patterns to cover more volume and variants of DE in real life.
Profile Image for Lucille Nguyen.
417 reviews12 followers
August 28, 2025
It's a quick intro and reference on how to create and maintain data pipelines. A lot of examples in here are pretty dated. Does a good job at introducing the basics. Look, it's a O'Reilly pocket reference what else can you expect. Could have probably done more to speak more generally so that it doesn't just seem like a few articles on dbt, Airflow, Hadoop but hey, it's good for what it is.
Profile Image for Mi Lia.
39 reviews6 followers
November 27, 2021
Very well written small book. The only reason I've put 4/5 and not 5/5 was that...for some weird reason I hadn't realized that the book would be such short. So it was a small disappointment. But surely the content of the book was worthwhile, if only there was more...
Profile Image for George Touros.
152 reviews10 followers
February 27, 2022
I found this book practical, concise and to-the-point. It's just a starting point really, but I'd like to know more from the people here saying that it's already outdated. From my point of view, the book was worth it.
26 reviews
October 17, 2022
Takie 6+/10, kilka ciekawych podrozdziałów, ale, nawet jak na rozmiary książki, to podejście jest mega wąskie + z 1/3 to snippety kodu, które można znaleźć w każdym tutorialu/dokumentacji/wątku na stacku
4 reviews
February 11, 2023
My favourite book for data engineers

Easy to read with great code examples. Really liked the less verbose and more practical approach. Highly recommended for anyone looking to pursue data engineering.
1 review
March 14, 2024
Great intro book

This book provides a great introduction to the various steps associated with building data pipelines. If you’re new to data engineering or want to underatand the high level steps associated with building data pipelines, this book acts as a great reference point
1 review
March 14, 2021
Practical

Good book, very practical. Teach you how set and manage a pipeline entirely. * * * * * * *
2 reviews
January 7, 2022
Nice Introduction to many data pipeline technologies, and to the point... Good book to get started as a data engineer, given you are already a senior developer.
Profile Image for Harry.
15 reviews
December 29, 2022
Good for people that want to grasp the basic concepts of data engineering before trying to jump into it as a profession. Entry level book.
35 reviews
December 4, 2023
I was really disappointed. Very detailed and special knowledge, even with code examples. Just few conceptual ideas…
Profile Image for Kris.
54 reviews2 followers
June 17, 2025
Great overview on the whole data engineering framework and best practice
Profile Image for Denis.
19 reviews
September 5, 2025
This is probably the best introductory book on data engineering. It provides a lot of intuition behind the concepts.
Profile Image for Jonathan.
1 review
July 10, 2023
Detailed examples given, primarily focused on Apache Airflow and Python. Clear explanations of each step. Would have benefited from a few examples of use cases across different industries. Overall great practical introduction to data pipelines.
Profile Image for Evan Oman.
31 reviews2 followers
December 6, 2022
Nice, quick overview of ELT pipelines focusing on SQL and Airflow. It covers a pretty narrow slice of the data engineering world but was still a useful read.
Displaying 1 - 30 of 31 reviews

Can't find what you're looking for?

Get help and learn more about the design.