Rate this book

Database Internals: A deep-dive into how distributed data systems work

Name: Database Internals: A deep-dive into how distributed data systems work
Rating: 4.25 (65 reviews)
ISBN: 9781492040347

Alex Petrov

Rate this book

When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.

Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed.

This book examines:

Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable log structured storage engines, with differences and use-cases for each
Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns, from UDP to reliable consensus protocols
Database clusters: Discover how to achieve consistent models for replicated data

GenresTechnologyProgrammingComputer ScienceSoftwareTechnicalEngineeringNonfiction

376 pages, Paperback

Published November 4, 2019

662 people are currently reading

3637 people want to read

About the author

Alex Petrov

1 book56 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

250 (47%)

4 stars

181 (34%)

3 stars

77 (14%)

2 stars

14 (2%)

1 star

4 (<1%)

Displaying 1 - 30 of 65 reviews

Sebastian Gebski

1,186 reviews1,335 followers

February 9, 2020

One of the best tech books I've read in the last 12 months.
It consists of 2 parts: DB internals & DB distribution internals.

The 1st part is pure gold - one can learn about B*-trees, LSM-trees, differences between locks and latches, memory VS disk optimizations, rebalancing, concurrency models for transactions and much, much more. I can't recall any single book that covers as much deep-level knowledge on these topics.

The 2nd part is less unique - there are other good resources on distributed systems. What I liked was a solid description of Paxos algorithm (including variants: multi- and fast-). A chapter about anti-entropy was very solid as well (it's a pity I didn't have any comparable resource on the topic before I've started working with Cassandra).

No point in extending this review further - it's a great book, just grab it (if you're keen on the topic) - you won't find anything better.

Bugzmanov

231 reviews97 followers

December 13, 2019

I liked this one a lot.

It complements nicely "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems" by Martin Kleppmann.
While Kleppmanns' book provides pretty solid overview of data processing landscape, this book goes deeper into implementation details, data structures and algorithms.
It's a bit more dry and more technical, but still it's a relatively easy read.

2019

Emre Sevinç

175 reviews430 followers

May 29, 2020

“Database Internals: A Deep Dive Into How Distributed Data Systems Work” by Alex Petrov, belongs to a very special category of O’Reilly books such as “Designing Data-Intensive Applications” and “Cassandra: The Definitive Guide“, in the sense that it is a serious deep dive into the most fundamental and challenging aspects of big and distributed data systems that we rely on daily basis.

Today there’s an unprecedented proliferation of distributed database technologies, combined with an ever-growing multitude of cloud computing services offering them, as well as rapid advances in physical storage systems such as NVMe SSDs that force considering different trade-offs and algorithms. A working software developer, system engineer, solution architect, or a CTO can easily be overwhelmed with so many distributed NoSQL, newSQL, time-series, graph, document, key-value, embedded databases in addition to typical, traditional, enterprise RDBMS variants. Luckily, all of these fancy distributed database technologies are built on a limited number of concepts, techniques, and algorithms which are concisely introduced and surveyed in “Database Internals” book.

Read the rest of the review...

Adrian

155 reviews29 followers

September 18, 2020

Unfortunately i have read this book after Martin Kelppman's Designing Data Intensive apps and probably this is the reason i rated it poorly.

The book starts really good by describing the intetnal systems that are encompassed in any dbms ( Connection Listener layer , Query parser+ optimizer layer , execution layer and of course the storage layer).

I wanted the book to explore more on this subject , how the components are drsigned and how they deal with concurrency etc.

The book then took a deep dive into tree-based data structures ..as in really deep.
I found myself at times wondering why am i reading this.It was way too terse.

Now after this part the book became interesting once again by the time it started tackling distributed transactions,consensus ,replication,byzantine faults, paxos algorhitms.

I was already familiar with all of those which in my humble opinion were tackled slighty better in the first book i mentioned.

All in all a good book , but i wouldn't classify it as complementing Martin Klepmann's

2020 concurrency dbms

Mikhail Filatov

363 reviews17 followers

April 2, 2020

The book is a strange mix - the second part is really about distributed data systems and it's ok - while "Designing Data Intensive applications" is better in this part.
The first part contains a lot of descriptions of different implementations of B*-Trees (replace * with any other symbol(s)) - most of them unreadable.

Bilal

113 reviews9 followers

April 28, 2020

The book is divided into two parts: The first part deals with storage on hard disk and solid state storage but in the context of a singular system; while the second part deals with distributed systems. In this sense it differs from most other books on distributed storage that typically do not discuss the topics in the first part of this book.

I found the book informative, but not very effective in building a solid understanding of concepts. I felt the author jumps from idea to (related) idea too frequently in the manner of short paragraphs, and in so doing doesn't see an idea through to the end in enough detail for it to be learned properly. Perhaps the first part was better presented; the second was not.

dantelk

206 reviews20 followers

October 6, 2024

This book deserves five stars, with its deeper sea-exploration of distributed system design patterns and explanation of database tree structures. I should confess that I couldn't help myself thinking "how's this going to help me in real life", especially file systems etc, but overall, great deep level concepts explained in an understandable way. Some concepts such as Paxos etc were above my tech knowledge, and I have to supplement the context with youtube videos. Still, i'll give it 5 stars, just the reading list is absolutely great, a lot of efforts have been given into this book.

hebe

Vishwanath

45 reviews7 followers

May 3, 2020

Informative but would have preferred more examples with practical scenarios. No code and this all mostly conceptual. Some good references to papers for subsequent reading. The first part of the book deals primarily with storage and covers an in-depth discussion of b-trees and types. The second half is focused on distributed systems and has useful sections on consensus protocols. Concepts like "2-phase commits" are explained well with figures. However, the lack of practical examples/code and overall dry subject matter made this a laborious read. Good book to reference theoretical concepts.

Povilas Balzaravičius

28 reviews

November 5, 2020

Good and interesting content. But some chapters are scattered with missing transitioning between topics or algorithms. Some other parts are developed well. Diagrams are missing where I have expected better explanation or are present for obvious things. The writing is 3/5, but the content is 5. So it is 4/5. I'm glad I've read the book and definitely will get back to some chapters to refresh some details.

Eric

4 reviews

February 1, 2021

This book really feels like two incomplete books in one. The first half of the book focuses on database internals, file formats, caching strategies etc. The second half of the book switches gears and dives into the components (algorithms, and strategies) used by distributed systems.

The problem is that there is nothing to tie the first and second parts of the book together. You could be reading entirely different books. The second issue is that even within each part, you are presented with a lot of great information, but there is no guidance (imo) on how you may want to logically put the pieces together to build a complete system.

True to the book's title, it does a great job of exploring the internals of database systems. If you are looking for how a specific component is built (ie you want to learn more about RAFT consensus), this is a great resource. If you are looking for a book that ties these concepts together however, I would suggest looking elsewhere.

owned

Łukasz Słonina

124 reviews25 followers

April 7, 2020

I like this book for the content, if you would like to know more about databases and distributed systems plus get long list of further reads then go for this book. What I don't like is actually that this material does not read like book (e.g. DDIA), it's more like compendium of algorithms, data structures and theories. Some of the algorithms could be better presented (more diagrams).

Szymon Kulec

212 reviews117 followers

October 21, 2023

If you're looking for a book that answers the question How all these dbs work?, this is the book. The broadness of this question can also trick you into doubting that one book can cover it all. This book tries to do its best. How it is done?

The book is split into two parts where the first covers the local storage and the second covers the distributed aspects. I must admit that for such a short book the author does an amazing job into squeezing a lot of knowledge into it. You'll find information about B+ trees, logs, ARIES, including most recent developments like RAMP transactions etc. Paxos, Raft? It got you covered.

All the descriptions are supported with drawings that are coherent and show exactly what they need to. Whether it's a log in Raft or messages sent in a 2PC or 3PC, they are clear and easy to follow (not in a sense that Paxos is easy to follow:P but rather that they do their job).

Why not 5 stars then? I think that it should be either two books or the book should be longer. I don't think that a reader that was not familiar with any of these can chew through it in a reasonable time manner and grasp these concepts. At the same time, if you heard about Raft, or what B+ tree CosmosDb uses, and want to understand what you heard, this is a book definitely for you.

Marcin Golenia

39 reviews7 followers

April 25, 2021

The book is organized into 2 parts and let me review the book in two parts.

1. Storage Engines (5/5)
I didn't expect that we will get so much into internals. Hats off! The knowledge of the author is extensive and nicely presented by the book text and illustrations. Everything you need to know - hardware (HDD, SSD) and the relation between them and data storage algorithms, transactions, slotted pages, b-trees variants, LSM. Great stuff.

2. Distributed Systems (4/5)
There is a gap between complexity level in introduction of topics and the actual "meat" that is described. It starts nice and easy then boom! "Percolator transaction execution", "Calvin" and "Spanner". Hard stuff and this part may keep you reading the page few times so you can understand it. I failed to do so in few places but I am fine with this - The author provides many references so you can learn the intermediate knowledge from there.

I know I will forget some of the advanced things that Alex tried to explain to me, but I will know where should I look for it ;) All in all the book will help you to build nice end2end understanding of how the database really work in both places - your computer and a big cluster.

Ahmad hosseini

320 reviews73 followers

February 20, 2021

I part 1, book explains internal database structure in details and examines its parts like storage engine very well.
“The storage engine (or database engine) is a software component of a database management system responsible for storing, retrieving, and managing data in memory and on disk, designed to capture a persistent, long-term memory of each node.”
Part 2 explains distributed systems characteristics in general and examine some specific topics related to distributed databases.
Book also introduces good sources for further reading

programming software-engineering

Bartosz Sypytkowski

45 reviews12 followers

August 9, 2020

This book comes along nicely together with "Designing Data-Intensive Applications" by Martin Klepmann: they both focus on core, fundamental concepts of persistent, distributed systems, providing wide variety of known algorithms and protocols for common problems in that area, including rationale behind each one, which helps to build intuition about their trade offs. It's also full of references for anyone, who wants to continue more in-depth exploration for a given topic.

Leo

323 reviews25 followers

April 5, 2022

I've found some similarities between this one and my favourite "wild board book" ("Designing Data-Intensive Applications" by Martin Kleppmann), though in the first part, dedicated to DB internals, it goes much more in-depth, and there are tons of awesome stuff there. Second part, dedicated to Distributed systems, is a bit less unique, but still very good, and worth reading.
So, wholeheartedly recommend to anyone working with DBs, or just distributed systems in general.

Elijah Oyekunle

196 reviews26 followers

Love it!

16 reviews

This is a mandatory book for the data engineering, database and architects worlds.
It has (literally) two parts, covered in great detail:
1. Storage Engines
2. Distributed Systems

For these two matters, this book is outstanding.

Some other database topics, such as query optimizers, were left out intentionally.

leisure

Michał Romanowski

2 reviews

February 16, 2025

Nic odkrywczego. Dobrze porządkuje koncepty.

Rafał Łasocha

4 reviews2 followers

September 6, 2023

The book was very uneven when it comes to level of detail it describes and the scope of the book is way too big, author should've focus on databases, instead of pulling distributed systems into it.

In the first part, I really liked how the author went deep and into the level of bit-masks of specific data structures when it comes to explaining how the data is stored, how pages work, how variable-sized data is moved and how DBMS approach storing different versions of the same data records for the sake of transactions.

What I did not like however is variety of B-Trees optimizations being described. I have no idea why did I read these chapters, I am sure I completely forgot them by now. It may be useful when you're actually, right now, implementing a database and want to have some options given to you on the table, but except that, it's just better to skim over these chapters (except the practical implementations like B*-Trees of course). I found it confusing why the author explains some B-Trees optimizations with such detail, but then when there is a subchapter about data recovery after database crash, there is just short, four-step, incredibly high-level description of the recovery process with the statement that it can be read in some paper. That's how it is, B-Trees optimizations described in very detailed, many more interesting things have only an overview.

When it comes to second part -- when it comes to majority of the content, it's just better to read Martin Kleppman's book. If someone has already read Martin's book, many chapters will be boring, they may be useful if someone didn't. When it comes to consensus algorithms, I have flashbacks from beginning of the book, because now, all different varieties of the Paxos algorithm are described. I've forgot these variations next day. Raft is nicely explained, but raft is elegant algorithm :) In retrospective, I have no idea why the author added majority of the second part into the book. Although of course databases are inherently distributed systems, if you want to "know more about databases", you probably don't mean "I want to know a lot about different consensus algorithms".

Not a bad book at all if someone doesn't know much neither about databases nor about distributed systems, though I imagine it may be incredibly hard to grasp all of the definitions described in the book.

Lauro Caetano

8 reviews6 followers

April 2, 2020

Excellent book! It goes a bit in the direction of what Design Data-intensive applications goes when it talks about distributed systems, dist transactions and so on.
But this book goes some steps further: explaining how the db represents data internally, and also explaining distributed systems algorithms.

Excellent read!

Ivan

223 reviews10 followers

December 30, 2020

Детальное, но без больших подробностей (это искупается большим числом ссылок и рекомендаций для дальнейшего изучения) описание структур и алгоритмов для современных систем.

Verisimilitude

16 reviews1 follower

December 31, 2021

Great, content-wise. Although appeared to have lost the flow/transitions in describing concepts.
Had to continuously take notes and cross-refer them so as to prevent myself from loosing the flow

Giulio Ciacchini

370 reviews12 followers

May 21, 2025

This is a proper technical book.
It’s not a tutorial or cookbook — it's aimed at engineers, architects, or data infrastructure enthusiasts who want to understand how things work under the hood, from the disk level to distributed consensus.

The book is divided into two major parts:
1. Storage Engines
This section dives into how data is stored, indexed, and retrieved.
Data Structures: It explains the pros and cons of using B-Trees (used in databases like MySQL, PostgreSQL) vs LSM Trees (used in Cassandra, LevelDB, RocksDB). You learn how each structure affects performance for reads, writes, and compaction.
Indexes: You get an overview of primary/secondary indexes, clustered vs non-clustered, and how different choices affect system performance.
Log-Structured Storage: The book covers Write-Ahead Logging (WAL) and log-structured storage engines, which are essential for durability and crash recovery.
MVCC (Multi-Version Concurrency Control): It outlines how databases handle multiple simultaneous reads/writes using versioning instead of locks.

2. Distributed Systems
This half focuses on the architecture behind distributed databases.
Replication: You’ll learn about leader-based vs leaderless replication, synchronous vs asynchronous, and consistency tradeoffs.
Sharding/Partitioning: The book explains horizontal partitioning, how consistent hashing works, and how data is evenly distributed across nodes.
Consensus Algorithms: One of the most technical parts of the book explains Paxos and Raft, the backbone of ensuring agreement in a distributed system. It's dense but foundational.
Quorum and Consistency: Petrov walks through consistency models (eventual vs strong consistency), CAP theorem, and how different systems make tradeoffs.
Failure Recovery: Includes strategies for anti-entropy, hinted handoff, read repair — essential concepts for building resilient systems.

However, for an Analysts, as my self, these are the most important concepts that are not too techincal.
Storage Engines & Data Layout
Even as an analyst, knowing how data is stored can explain why some queries are fast and others slow.
Row‑Oriented vs. Column‑Oriented: Column stores (e.g., Amazon Redshift, Google BigQuery) read only the needed columns for analytics, vastly improving performance on large scans 
Indexes: Clustered (Primary) Indexes physically order data on disk, speeding up range queries but slowing inserts; Secondary Indexes maintain pointers to rows, accelerating lookups on non‑key columns at the cost of extra storage and write overhead.
Partitioning & Sharding: Splitting large tables by date (partitioning) or by hash/range (sharding) lets queries scan only relevant subsets of data, reducing I/O 

Transactions & Consistency
For ad hoc analysis you may not use transactions heavily, but understanding them helps guard against reading “dirty” or inconsistent data:
ACID Properties
Atomicity: All-or-nothing operations prevent partial updates.
Consistency: Database moves from one valid state to another (e.g., foreign‑key enforcement).
Isolation Levels: Controls visibility of concurrent transactions; higher isolation (Repeatable Read, Serializable) prevents anomalies but can reduce concurrency
Durability: Once committed, data survives crashes via Write‑Ahead Logging (WAL) 

Scalability & Distributed Systems
As data grows, systems often move to distributed architectures—analysts benefit from knowing the basics:
Replication Models: Leader‑Follower: Strong consistency for reads/writes to leader, eventual consistency on followers; Leaderless (Gossip/Quorum): Higher availability (e.g., Cassandra’s quorum reads/writes) at the cost of possible staleness.
Eventual vs. Strong Consistency: Analytics on eventually consistent data can yield stale results; choose data sources or time windows accordingly.

coding non-fiction

Yuchen

5 reviews

July 14, 2025

I like this one a lot, even more than "Designing Data-Intensive Applications".

This book is divided into two parts:

The first part is about B-Tree and LSM tree. The B-Tree part is really deep and it peaks at Chapter 5, where the author lay down a lot of details about how to implement transaction on local disk and talked about a lot of details regarding the concurrent control of transaction. At the same time, there is a little caveat of the ordering of B-Tree and LSM-Tree, for example, all of the chapter 5 would apply to LSM tree too. I had to re-read the chapter 5 when finishing the chapter 5 to grasp all the details which was omitted in chapter 6. Not a big deal.

The second part is about distributed database. A huge caveat is that the author, although with a lot of experiences working on this field, does not dive deep like the first part. I can understand that some of the distributed system concepts were relatively new compared to local database. I really like the way how these chapters are lay out: starting with what can go wrong in distributed system (network, clock), build up on top of it to talk about classic distributed system model (FLP), and finally reach advanced topics such as failure detection, leader election, replication and consensus.

For the second part, I enjoyed a lot of the external links in the book. Some of them directly linked to the Apache Cassandra's bug history, wjere there is code link and also history of open source contributor's discussion about the issues.

I feel like the author lack a detailed explanation for sharding, the book just briefly talks about hash sharding. Also the some of the complex algorithms are hard to follow, for example bitmap version vector, Generalized Paxos Algorithm. On the flip side, if you follow the paper, it would actually give a lot of more details about these problems and they are very interesting and can serve as a big part of complement for the book.

Overall really solid book.

re-read

Orkhan Huseynli

9 reviews

April 28, 2025

The book does a great job on the Part I, explaining internals of databases (as the name suggests). You are going to learn a lot about why B-Trees are important, and why you can't just use AVL or RedBlack tree instead of B-Tree. It talks about architecture of databases and storage engines and how some databases use existing storage engines to not to start from scratch. It also talks about how database pages are managed and evicted, as well as page layout on disk as well as finally answered my question about OS pages and database pages. Moreover, thanks to this book I finally know what is a latch. At the end of Part I, you will learn about LSM trees as well.

However, I found LSM tree explanations a bit confusing and for me DDIA book did a better job explaining LSM tree than this one. Overall, most of the chapters the author talks about B-Trees, so he could name the book B-Tree internals as well.

What I did not like about the book is Part II. Part II was about distributed systems and somehow it turned out to be DDIA in a nutshell but more confusing. I would expect this book to talk about real applications of replication, leader election and consensus, but even in the chapter called "Replication and Consistency" it does not talk about replication (e.g. it could have mentioned types of replications as DDIA book did). Also on the chapter about "Distributed Transactions" the author feels remorse and decides to write about Database Partitioning in a small paragraph. I would expect this book to talk about partitioning in a dedicated chapter and show real life examples, how MongoDB, Postgres or MySQL does it.

So if you don't want to get confused about the distributed systems, I would recommend reading DDIA and just pick some topics from this book to learn more about them where DDIA does not deep-dive (e.g. Paxos and Raft).

Phil Eaton

119 reviews296 followers

January 30, 2024

A solid guide told from a unique perspective. It is clearly told from the perspective of Cassandra which is interesting since most of what I find online is from the perspective of OLTP databases. Of course this perspective makes sense since the author is a Cassandra contributor. This perspective, however, does not mean that the book does not do a good job covering aspects that are not specific, or not related at all, to Cassandra.

The second half was stronger than the first. My primary critique is that parts of the book could have been better edited for clarity. Some ideas were discussed in an order that was difficult to follow until reaching the end of a section. I can imagine the difficulty though of finding editors who can help make sense of such complicated topics.

All in all, a solid introduction to BTrees, LSM trees, replication, partitioning, and consensus. Partnered with a good editor, a future edition could be 5 stars.

Peter Aronson

396 reviews18 followers

February 19, 2025

Four-and-a-half stars. What's in the book is solid, mostly well explained, and useful. But it's a bit of a miscellanea -- a bit of this and a bit of that -- all organized around the subject of databases. The first half of the book, Storage Engines, seems more complete. The second half, Distributed Systems, seems more like a survey with references. Which is odd, given that the first sentence of the Preface is "Distributed database systems are an integral part of most businesses and the vast majority of software applications". (Rather an extreme over statement in my opinion.) With that opening, I would expect a more complete coverage of Distributed Systems, even if it is a bit topic that would have required a bigger book. But this is still a useful book on both topics.

Joe Michalak

144 reviews5 followers

December 26, 2024

This book is an excellent treatment of how database system internals work. The first section of the book covers practical items like file systems and other local-related challenges - there is a wealth of information to dive into here, and I already have a large collection of books to dig into just from the references here. Then, in the second half, you dive into concurrency and distributed systems. While I've absorbed the info like a sponge, I am already looking forward to coming back to this after spending some additional time with some other distributed systems readings.

A truly excellent book that is truly a wealth of knowledge.

coding

Andreea Olaru

30 reviews

May 13, 2025

A somehow comprehensive summary into distributed systems concepts (consistency, availability, consensus) although i find it lacks practicality, being way to theoretical.

Loose explanations: like the ones for copy and witness replicas not so well explained
"in the case of write timeouts or copy replica failures, witness replicas can be upgraded to to temporary store the record in place of failed or timed out replicas" what does it even mean?

As context, i ve been a backend engineer for 7 years now, i found it quite difficult to understand and keep my focus, maybe these sort of concepts are the kind you need to implement while reading rather than just read about.

Displaying 1 - 30 of 65 reviews

More reviews and ratings