"There's an incredible amount of depth and thinking in the practices described here, and it's impressive to see it all in one place." --Win Treese, coauthor of Designing Systems for Internet Commerce The Practice of Cloud System Administration, Volume 2, focuses on "distributed" or "cloud" computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach. Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial Designing and building modern web and distributed systemsFundamentals of large system design Understand the new software engineering implications of cloud administration Make systems that are resilient to failure and grow and scale dynamically Implement DevOps principles and cultural changes IaaS/PaaS/SaaS and virtual platform selection Operating and running systems using the latest DevOps/SRE strategiesUpgrade production systems with zero down-time What and how to automate; how to decide what not to automate On-call best practices that improve uptime Why distributed systems require fundamentally different system administration techniques Identify and resolve resiliency problems before they surprise you Assessing and evaluating your team's operational effectivenessManage the scientific process of continuous improvement A forty-page, pain-free assessment system you can start using today
Simply the best book for system administrators and their managers. Packed with great stuff from first page to the last. If you have to read one chapter - it's the Appendix A :)
(3.5) Some very actionable stuff, some disappointing lack of precision, an underlying assumption that ops and devs are separate populations, skip all of Part I
From the volume of notes I took here (most of which are of the "good idea" nature), this was a helpful read. It was way too long for what I took from it, however, and I also felt compelled to disagree or object to the imprecision in many places. So it's far from perfect and a condensed version could be far more helpful. In addition, there's a lot of time devoted to topics that are only relevant if there is a separate role for operations vs development. If your org doesn't do this, that stuff is skippable (and I'd prefer a little more discussion of the debate of the relative merits of the two approaches--there's a little, but quickly take the perspective that it's optimal to have distinct roles, then spends a lot of time creating subtle mechanisms to align developers with operational pain). So I can't say I recommend this to every engineer.
See my notes for what's worth extracting. Do check out Appendix A as well in particular (how to objectively assess health of service/operations).
For distributed system design, see https://www.goodreads.com/book/show/3... instead (there is coverage of this topic here, but not nearly as good of a treatment, and there may actually be some misleading recommendations here). Skip those chapters if you read this book (pretty much all of Part I).
Call me pedantic, but these precision failures were disappointing and make you concerned about other areas where they : * saying an SSD's failure rate is /caused/ by how many the manufacturer says each block can be written. * mixing up 'positive feedback loop' and 'negative feedback loop' * talking about exponential growth (correctly) and then saying it's O(n^m) rather than O(m^n) * some horrible atrocity of probability about mean time to repair / fail that i don't want to look back at again to relive the pain to accurately capture here...but tied to poor assumption that a mean completely defines a probability distribution.
A great high level view of architecting distributed systems. This book reads like a textbook and gives a great summary of all aspects of a distributed system without digging into the weeds. Some chapters are really basic (as the book assumes you know nothing about the topic), but has enough interesting information to help people that are new or have intermediate-level experience in the space.
In general, it's good, full of proper advice and useful information.
But:
- It feels really dated. Unlike TPSNA, I can see a lot of things that have moved on and are done differently now. It can definitely benefit from a new edition; - It's too preachy and "this is perfect" when it talks about DevOps and some methodologies. These are not one-size-fits-all, and too much is taken for granted; - Not everyone is a hyperscaler, and that should be more prominent in the book; - Some things are borderline weird, like compiled languages being strongly-typed (almost nobody would agree that C/C++ are strongly-typed in today's terms), and that you should not use scripting languages like Python for large projects (which seems to be in disagreement with reality at the moment).
Awesome book and insight into modern ops work and DevOps concepts. As someone who primarily does Dev, This really gave me a glimpse into what is important to get high performing teams and systems going, and a lot of great ideas to add to my current projects (where I sometime wear the Ops hat). Really insightfull for me.
It's one of those I'll be referencing alot, and maybe rereading completely some time in the future.
Amazing. This book has a everything you have to bear in mind when your goal is led a change in IT. Covers the power of good practices, roles, tools, minset and day to day improvements.
Highly recommend for anyone who wants to be effective and instructed in DevOps carrier.
Personally, I found there's very little core material in here I didn't already know about. Rather, I found the anecdotes and citations more useful for thinking about that core material. For course readings or junior admins, it is a decent survey book on the subject. I will probably be adding a task for new hires to read one or two specific chapters out of the book as an introduction to our workplace, for example.
Occasionally, the book struggle to place its content within the 'devops' community movement. The service delivery chapters in particular were a painful slog to read through, and alternated between insulting the waterfall model and then proceeding to describe how to implement and automate such a process. Most of the content revolves around n-tier web architectures, with very little discussion of operating and managing distributed clusters. Relatedly, the pitfalls of DB schema changes, get very little attention.
The title mentions Cloud administration, but this book is much more than that. The concepts covered here are excellent foundational IT operational ideas that are relevant to legacy IT systems. This is a great complement to the SRE book from Google Engineers. This book should be read by every IT leader who is interested in operating a high functioning team.
A new companion volume to the classic "Practice of System and Network Administration" tome. Invaluable as a practice manual, and also fascinating for its insights into how system administration has had to shift and adapt for the web age.
Всеобъемлющая книга о том, как быть хорошим системным администратором. Знающий то, что в ней написано, может рассчитывать на очень хорошую зарплату и уважение коллег - очень рекомендую, если у вас ещё нет десятка лет опыта поддержки высоконагруженных распределённых систем за спиной.