English | 2020 | ISBN-13: 978-1492081494 | 234 Pages | EPUB | 7.02 MB
Site reliability eeering (SRE) is more relevant than ever.
Knowing how to keepsystems reliable has become a critical skill. With this practical book, newcomers and old hatsalike will explore a broad range of conversations happening in SRE. You’ll get actionable adviceon several topics, including how to adopt SRE, why SLOs matter, when you need to upgradeyour incident response, and how monitoring and observability differ.
Editors Jaime Woo and Emil Stolarsky, co-founders of Incident Labs, have collected 97 concise and useful tips from across the industry, including trusted best practices and new approaches toknotty problems. You’ll grow and refine your SRE skills through sound advice and thought-provokingquestions that drive the direction of the field.
Some of the 97 things you should know:
"Test Your Disaster Plan"–Tanya Reilly
"Integrating Empathy into Tools"–Daniella Niyonkuru
"The Best Advice I Can Give to Teams"–Nicole Forsgren
"Where to SRE"–Fatema Boxwala
"Facing Your First Page"–Andrew Louis
"I Have an Error Budget, Now What?"–Alex Hidalgo
"Get Your Work Recognized: Write a Brag Document"–Julia Evans and Karla Burnett