Amazon cover image
Image from Amazon.com

97 things every SRE should know : collective wisdom from the experts

Contributor(s): Material type: TextTextPublication details: Mumbai : O'Reilly, ©2021Description: xvii, 231 p. : ill. ; 24 cmISBN:
  • 9789385889516
Other title:
  • Ninety-seven things every Site Reliability Engineer should know
Subject(s): DDC classification:
  • 620.001 STO-9
LOC classification:
  • TA169 .N56 2020
Online resources:
Contents:
New to SRE. Site reliability engineering in six words / Alex Hidalgo -- Do we know why we really want reliability? / Niall Murphy -- Building self-regulating processes / Denise Yu -- Four engineers of an SRE seder / Jacob Scott -- The reliability stack / Alex Hidalgo -- Infrastructure: it's where the power is / Charity Majors -- Thinking about resilience / Justin Li -- Observability in the development cycle / Charity Majors and Liz Fong-Jones -- There is no magic / Bouke van der Bijl -- How Wikipedia is served to you / Effie Mouzeli -- Why you should understand ( a little) about TCP / Julia Evans -- The importance of a management interface / Salim Virji -- When it comes to storage, think distributed / Salim Virji -- The role of cardinality / Charity Majors and Liz Fong-Jones -- Security is like an onion / Lucas Fontes -- Use your words / Tanya Reilly -- Where to SRE / Fatema Boxwala -- Dear future team / Frances Rees -- Sustainability and burnout / Denise Yu -- Don't take advice from Graybeards / John Looney -- Facing that first page / Andrew Louis -- Zero to one. SRE, at any size, is cultural / Matthew Huxtable -- Everyone is an SRE in a small organization / Matthew Huxtable -- Auditing your environment for improvements / Joan O'Callaghan -- With incident response, start small / Thai Wood -- Solo SRE: effecting large-scale change as a single individual / Ashley Poole -- Design goals for SLO measurement / Ben Sigelman -- I have an error budget- now what? / Alex Hidalgo -- How to change things / Joan O'Callaghan -- Methodological debugging / Avishai Ish-Shalom and Nati Cohen -- How startups can build an SRE mindset / Tamara Miner -- Bootstrapping SRE in Enterprises / Vanessa Yiu -- It's okay not to know, and it's okay to be wrong / Todd Palino -- Storytelling is a superpower / Anita Clarke -- Get your work recognized: write a brag document / Julie Evans and Karla Burnett -- One to ten. Making work visible / Lorin Hochstein -- An overlooked engineering skill / Murali Suriar -- Unpacking the on-call divide / Jason Hand -- The maestros of incident response / Andrew Louis -- Effortless incident management / Suhail Patel, Miles Bryant, and Chris Evans -- If you're doing runbooks, do them well / Spike Lindsey -- Why I hate our playbooks / Frances Rees -- What machines do well / Michelle Brush -- Integrating empathy into SRE tools / Daniella Niyonkuru -- Using ChatOps to implement empathy / Daniella Niyonkuru -- Move fast to unbreak things / Michelle Brush -- You don't know for sure until it runs in production / Ingrid Epure -- Sometimes the fix is the problem / Jake Pittis -- Legendary / Elise Gale -- Metrics are not SLIs (the measure everything trap) / Brian Murphy -- When SLOs attack: pathological SLOs and how to fix them / Narayan Desai -- Holistic approach to product reliability / Kristine Chen and Bart Ponurkiewicz -- In search of the lost time / Ingrid Epure -- Unexpected lessons from office hours / Tamara Miner -- Building tools for internal customers that they actually want to use / Vinessa Wan -- It's about the individuals and interactions / Vinessa Wan -- The human baseline in SRE / Effie Mouzeli -- Remotely productive or productively remote / Avleen Vig -- Of margins and individuals / Kurt Andersen -- The importance of margins in systems / Kurt Andersen -- Fewer spreadsheets, more napkins / Jacob Bednarz -- Sneaking in your DevOps deliciously / Vinessa Wan -- Effecting SRE cultural changes in enterprise / Vanessa Yiu -- To all the SREs I've loved / Felix Glaser -- Complex: the most overloaded word in technology / Laura Nolan -- Ten to hundred. The best advice I can give to teams / Nicole Forsgren -- Create your supporting artifacts / Daria Barteneva and Eva Parish -- The order of operations for getting SLO buy-in / David K. Rensin -- Heroes are necessary, but hero culture is not / Lei Lopez -- On-call rotations that people want to join / Miles Bryant, Chris Evans, and Suhail Patel -- Study of human factors and team culture to improve paper fatigue / Daria Barteneva -- Optimize for MTTBTB (mean time to back to bed) / Spike Lindsey -- Mitigating and preventing cascading failures / Rita Lu -- On-call health: the metric you could be measuring / Caitie McCaffrey -- The SRE as a diplomat / Johnny Boursiquot -- Test your disaster plan / Tanya Reilly -- Why training matters to an SRE practice and SRE matters to your training program / Jennifer Petoff -- The power of uniformity / Chris Evans, Suhail Patel, and Miles Bryant -- Bytes per user value / Arshia Mufti -- Make your engineering blog a priority / Anita Clarke -- Don't let anyone run code in your context / John Looney -- Trading places: SRE and product / Shubheksha Jalan -- You see teams, I see product / Avleen Vig -- The performance emergency fund / Dawn Parzych -- Important but not urgent: roadmaps for SREs / Laura Nolan -- The future of SRE. That 50% thing / Tanya Reilly -- Following the path of safety-critical systems / Heidy Khlaaf -- The importance of formal specification / Hillel Wayne -- Risk and rot in sociotechnical systems / Laura Nolan -- SRE in crisis / Niall Murphy -- Expected risk limitations / Blake Bisset -- Beyond local risk: accounting for Angry Birds / Blake Bisset -- A word from software safety nerds / J. Paul Reed -- Incidents: a window into Gaps / Lorin Hochstein -- The third age of SRE / Björn "Beorn" Rabenstein.
Summary: "Site reliability engineering (SRE) is more relevant than ever. Knowing how to keep systems reliable has become a critical skill. With this practical book, newcomers and old hats alike will explore a broad range of conversations happening in SRE. You'll get actionable advice on several topics, including how to adopt SRE, why SLOs matter, when you need to upgrade your incident response, and how monitoring and observability differ. Editors Jaime Woo and Emil Stolarsky, co-founders of Incident Labs, have collected 97 concise and useful tips from across the industry, including trusted best practices and new approaches to knotty problems. You'll grow and refine your SRE skills through sound advice and thought-provoking questions that drive the direction of the field."--
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Collection Call number Status Date due Barcode Item holds
Books Books IIITD General Stacks Engineering and Allied Operation 620.001 STO-9 (Browse shelf(Opens below)) Available 012038
Total holds: 0

Includes bibliographical references and index.

New to SRE. Site reliability engineering in six words / Alex Hidalgo -- Do we know why we really want reliability? / Niall Murphy -- Building self-regulating processes / Denise Yu -- Four engineers of an SRE seder / Jacob Scott -- The reliability stack / Alex Hidalgo -- Infrastructure: it's where the power is / Charity Majors -- Thinking about resilience / Justin Li -- Observability in the development cycle / Charity Majors and Liz Fong-Jones -- There is no magic / Bouke van der Bijl -- How Wikipedia is served to you / Effie Mouzeli -- Why you should understand ( a little) about TCP / Julia Evans -- The importance of a management interface / Salim Virji -- When it comes to storage, think distributed / Salim Virji -- The role of cardinality / Charity Majors and Liz Fong-Jones -- Security is like an onion / Lucas Fontes -- Use your words / Tanya Reilly -- Where to SRE / Fatema Boxwala -- Dear future team / Frances Rees -- Sustainability and burnout / Denise Yu -- Don't take advice from Graybeards / John Looney -- Facing that first page / Andrew Louis -- Zero to one. SRE, at any size, is cultural / Matthew Huxtable -- Everyone is an SRE in a small organization / Matthew Huxtable -- Auditing your environment for improvements / Joan O'Callaghan -- With incident response, start small / Thai Wood -- Solo SRE: effecting large-scale change as a single individual / Ashley Poole -- Design goals for SLO measurement / Ben Sigelman -- I have an error budget- now what? / Alex Hidalgo -- How to change things / Joan O'Callaghan -- Methodological debugging / Avishai Ish-Shalom and Nati Cohen -- How startups can build an SRE mindset / Tamara Miner -- Bootstrapping SRE in Enterprises / Vanessa Yiu -- It's okay not to know, and it's okay to be wrong / Todd Palino -- Storytelling is a superpower / Anita Clarke -- Get your work recognized: write a brag document / Julie Evans and Karla Burnett -- One to ten. Making work visible / Lorin Hochstein -- An overlooked engineering skill / Murali Suriar -- Unpacking the on-call divide / Jason Hand -- The maestros of incident response / Andrew Louis -- Effortless incident management / Suhail Patel, Miles Bryant, and Chris Evans -- If you're doing runbooks, do them well / Spike Lindsey -- Why I hate our playbooks / Frances Rees -- What machines do well / Michelle Brush -- Integrating empathy into SRE tools / Daniella Niyonkuru -- Using ChatOps to implement empathy / Daniella Niyonkuru -- Move fast to unbreak things / Michelle Brush -- You don't know for sure until it runs in production / Ingrid Epure -- Sometimes the fix is the problem / Jake Pittis -- Legendary / Elise Gale -- Metrics are not SLIs (the measure everything trap) / Brian Murphy -- When SLOs attack: pathological SLOs and how to fix them / Narayan Desai -- Holistic approach to product reliability / Kristine Chen and Bart Ponurkiewicz -- In search of the lost time / Ingrid Epure -- Unexpected lessons from office hours / Tamara Miner -- Building tools for internal customers that they actually want to use / Vinessa Wan -- It's about the individuals and interactions / Vinessa Wan -- The human baseline in SRE / Effie Mouzeli -- Remotely productive or productively remote / Avleen Vig -- Of margins and individuals / Kurt Andersen -- The importance of margins in systems / Kurt Andersen -- Fewer spreadsheets, more napkins / Jacob Bednarz -- Sneaking in your DevOps deliciously / Vinessa Wan -- Effecting SRE cultural changes in enterprise / Vanessa Yiu -- To all the SREs I've loved / Felix Glaser -- Complex: the most overloaded word in technology / Laura Nolan -- Ten to hundred. The best advice I can give to teams / Nicole Forsgren -- Create your supporting artifacts / Daria Barteneva and Eva Parish -- The order of operations for getting SLO buy-in / David K. Rensin -- Heroes are necessary, but hero culture is not / Lei Lopez -- On-call rotations that people want to join / Miles Bryant, Chris Evans, and Suhail Patel -- Study of human factors and team culture to improve paper fatigue / Daria Barteneva -- Optimize for MTTBTB (mean time to back to bed) / Spike Lindsey -- Mitigating and preventing cascading failures / Rita Lu -- On-call health: the metric you could be measuring / Caitie McCaffrey -- The SRE as a diplomat / Johnny Boursiquot -- Test your disaster plan / Tanya Reilly -- Why training matters to an SRE practice and SRE matters to your training program / Jennifer Petoff -- The power of uniformity / Chris Evans, Suhail Patel, and Miles Bryant -- Bytes per user value / Arshia Mufti -- Make your engineering blog a priority / Anita Clarke -- Don't let anyone run code in your context / John Looney -- Trading places: SRE and product / Shubheksha Jalan -- You see teams, I see product / Avleen Vig -- The performance emergency fund / Dawn Parzych -- Important but not urgent: roadmaps for SREs / Laura Nolan -- The future of SRE. That 50% thing / Tanya Reilly -- Following the path of safety-critical systems / Heidy Khlaaf -- The importance of formal specification / Hillel Wayne -- Risk and rot in sociotechnical systems / Laura Nolan -- SRE in crisis / Niall Murphy -- Expected risk limitations / Blake Bisset -- Beyond local risk: accounting for Angry Birds / Blake Bisset -- A word from software safety nerds / J. Paul Reed -- Incidents: a window into Gaps / Lorin Hochstein -- The third age of SRE / Björn "Beorn" Rabenstein.

"Site reliability engineering (SRE) is more relevant than ever. Knowing how to keep systems reliable has become a critical skill. With this practical book, newcomers and old hats alike will explore a broad range of conversations happening in SRE. You'll get actionable advice on several topics, including how to adopt SRE, why SLOs matter, when you need to upgrade your incident response, and how monitoring and observability differ. Editors Jaime Woo and Emil Stolarsky, co-founders of Incident Labs, have collected 97 concise and useful tips from across the industry, including trusted best practices and new approaches to knotty problems. You'll grow and refine your SRE skills through sound advice and thought-provoking questions that drive the direction of the field."--

There are no comments on this title.

to post a comment.
© 2024 IIIT-Delhi, library@iiitd.ac.in