Site reliability engineering (SRE) books

One of the best ways to learn about or deepen your knowledge of SRE is through reading about it. Here are some of the best written sources of information we've seen on the topic.

Core SRE books

For more detailed information about site reliability engineering (SRE), the best source is a trio of books that have been published on the subject

Each of those books provides an important set of information:

  • The SRE Book - provides a detailed explanation of how Google implemented SRE over the years.

  • The SRE Workbook - a companion to The SRE Book that provides a more detailed explanation of not just the “what” of SRE at Google and a few other places, but the “how” and “why”.

  • Seeking SRE - provides a more expansive view of the SRE world beyond its origin including information on how it has been implemented in other environments.

Because these books describe the experience, environments and culture of organizations that may or may not resemble the one you are in, it is important to read these books with a critical eye. As you read, try to determine which practices would or would not succeed in your organization. Take some time to identify the information that you are certain can provide some positive value. Think about which parts of your organization's culture and values can support SRE work as described and which might make it more challenging. A careful, iterative embracing of SRE will almost always yield better results than a wholesale duplication of something you read in these books.

Additional SRE books

After reading the books mentioned above, if you'd like to go deeper into the practice of Service Level Indicators (SLIs)/Service Level Objectives (SLOs) the following book is an excellent resource:

Implementing Service Level Objectives

If you would like to explore the intersection of security and reliability engineering, this book provides a variety of good observations on the topic:

Building Secure and Reliable Systems