Chaos Engineering. System Resiliency in Practice - Helion
ISBN: 978-14-920-4381-2
stron: 308, Format: ebook
Data wydania: 2020-04-06
Księgarnia: Helion
Cena książki: 211,65 zł (poprzednio: 246,10 zł)
Oszczędzasz: 14% (-34,45 zł)
As more companies move toward microservices and other distributed technologies, the complexity of these systems increases. You can’t remove the complexity, but through Chaos Engineering you can discover vulnerabilities and prevent outages before they impact your customers. This practical guide shows engineers how to navigate complex systems while optimizing to meet business goals.
Two of the field’s prominent figures, Casey Rosenthal and Nora Jones, pioneered the discipline while working together at Netflix. In this book, they expound on the what, how, and why of Chaos Engineering while facilitating a conversation from practitioners across industries. Many chapters are written by contributing authors to widen the perspective across verticals within (and beyond) the software industry.
- Learn how Chaos Engineering enables your organization to navigate complexity
- Explore a methodology to avoid failures within your application, network, and infrastructure
- Move from theory to practice through real-world stories from industry experts at Google, Microsoft, Slack, and LinkedIn, among others
- Establish a framework for thinking about complexity within software systems
- Design a Chaos Engineering program around game days and move toward highly targeted, automated experiments
- Learn how to design continuous collaborative chaos experiments
Osoby które kupowały "Chaos Engineering. System Resiliency in Practice", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Chaos Engineering. System Resiliency in Practice eBook -- spis treści
- Preface
- Conventions Used in This Book
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- Introduction: Birth of Chaos
- Management Principles as Code
- Chaos Monkey Is Born
- Going Big
- Formalizing the Discipline
- Community Is Born
- Fast Evolution
- I. Setting the Stage
- 1. Encountering Complex Systems
- Contemplating Complexity
- Encountering Complexity
- Example 1: Mismatch Between Business Logic and Application Logic
- Example 2: Customer-Induced Retry Storm
- Example 3: Holiday Code Freeze
- Confronting Complexity
- Accidental Complexity
- Essential Complexity
- Embracing Complexity
- 2. Navigating Complex Systems
- Dynamic Safety Model
- Economics
- Workload
- Safety
- Economic Pillars of Complexity
- State
- Relationships
- Environment
- Reversibility
- Economic Pillars of Complexity Applied to Software
- The Systemic Perspective
- Dynamic Safety Model
- 3. Overview of Principles
- What Chaos Engineering Is
- Experimentation Versus Testing
- Verification Versus Validation
- What Chaos Engineering Is Not
- Breaking Stuff
- Antifragility
- Advanced Principles
- Build a Hypothesis Around Steady-State Behavior
- Vary Real-World Events
- Run Experiments in Production
- Automate Experiments to Run Continuously
- Minimize Blast Radius
- The Future of The Principles
- What Chaos Engineering Is
- II. Principles in Action
- 4. Slacks Disasterpiece Theater
- Retrofitting Chaos
- Design Patterns Common in Older Systems
- Design Patterns Common in Newer Systems
- Getting to Basic Fault Tolerance
- Disasterpiece Theater
- Goals
- Anti-Goals
- The Process
- Preparation
- The Exercise
- Debriefing
- How the Process Has Evolved
- Getting Management Buy-In
- Results
- Avoid Cache Inconsistency
- Try, Try Again (for Safety)
- Impossibility Result
- Conclusion
- Retrofitting Chaos
- 5. Google DiRT: Disaster Recovery Testing
- Life of a DiRT Test
- The Rules of Engagement
- DiRT tests must have no service-level objective breaking impact on external systems or users
- Production emergencies always take precedence over DiRT emergencies
- Run DiRT tests with transparency
- Minimize cost, maximize value
- Treat disaster tests as you would actual outages
- What to Test
- Run at service levels
- Run without dependencies
- People outages
- Release and rollback
- Incident management procedures
- Datacenter operations
- Capacity management
- Business continuity plans
- Data integrity
- Networks
- Monitoring and alerting
- Telecommunications and IT systems
- Medical and security emergencies
- Reboot everything
- How to Test
- Gathering Results
- The Rules of Engagement
- Scope of Tests at Google
- Conclusion
- Life of a DiRT Test
- 6. Microsoft Variation and Prioritization of Experiments
- Why Is Everything So Complicated?
- An Example of Unexpected Complications
- A Simple System Is the Tip of the Iceberg
- Categories of Experiment Outcomes
- Known Events/Unexpected Consequences
- Unknown Events/Unexpected Consequences
- Prioritization of Failures
- Explore Dependencies
- Degree of Variation
- Varying Failures
- Combining Variation and Prioritization
- Expanding Variation to Dependencies
- Deploying Experiments at Scale
- Conclusion
- Why Is Everything So Complicated?
- 7. LinkedIn Being Mindful of Members
- Learning from Disaster
- Granularly Targeting Experiments
- Experimenting at Scale, Safely
- In Practice: LinkedOut
- Failure Modes
- Using LiX to Target Experiments
- Browser Extension for Rapid Experimentation
- Automated Experimentation
- Conclusion
- 8. Capital One Adoption and Evolution of Chaos Engineering
- A Capital One Case Study
- Blind Resiliency Testing
- Transition to Chaos Engineering
- Chaos Experiments in CI/CD
- Things to Watch Out for While Designing the Experiment
- Tooling
- Team Structure
- Evangelism
- Conclusion
- A Capital One Case Study
- III. Human Factors
- 9. Creating Foresight
- Chaos Engineering and Resilience
- Steps of the Chaos Engineering Cycle
- Designing the Experiment
- Tool Support for Chaos Experiment Design
- Effectively Partnering Internally
- Understand Operating Procedures
- Discuss Scope
- Hypothesize
- Conclusion
- 10. Humanistic Chaos
- Humans in the System
- Putting the Socio in Sociotechnical Systems
- Organizations Are a System of Systems
- Engineering Adaptive Capacity
- Spotting Weak Signals
- Failure and Success, Two Sides of the Same Coin
- Putting the Principles into Practice
- Build a Hypothesis
- Vary Real-World Events
- Minimize the Blast Radius
- Case Study 1: Gaming Your Game Days
- The hypothesis
- The variable
- The outcome
- Communication: The Network Latency of Any Organization
- Pave new pathways to communication
- Case Study 2: Connecting the Dots
- The hypothesis
- The variable
- The outcome
- Leadership Is an Emergent Property of the System
- Moving your organization forward
- Using signals to set a direction
- Case Study 3: Changing a Basic Assumption
- The hypothesis
- The variable
- The outcome
- Safely Organizing the Chaos
- All You Need Is Altitude and a Direction
- Close the Loops
- If Youre Not Failing, Youre Not Learning
- Humans in the System
- 11. People in the Loop
- The Why, How, and When of Experiments
- The Why
- The How
- The When
- During incidents: Is this related to what youre running?
- But what about automation and getting people out of the loop?
- Functional Allocation, or Humans-Are-Better-At/Machines-Are-Better-At
- The Substitution Myth
- Conclusion
- The Why, How, and When of Experiments
- 12. The Experiment Selection Problem (and a Solution)
- Choosing Experiments
- Random Search
- The Age of the Experts
- The role of the expert
- Observability: The Opportunity
- Observability for Intuition Engineering
- Lineage-driven fault injection
- Observability for Intuition Engineering
- Conclusion
- Choosing Experiments
- IV. Business Factors
- 13. ROI of Chaos Engineering
- Ephemeral Nature of Incident Reduction
- Kirkpatrick Model
- Level 1: Reaction
- Level 2: Learning
- Level 3: Transfer
- Level 4: Results
- Alternative ROI Example
- Collateral ROI
- Conclusion
- 14. Open Minds, Open Science, and Open Chaos
- Collaborative Mindsets
- Open Science; Open Source
- Open Chaos Experiments
- Experiment Findings, Shareable Results
- Conclusion
- 15. Chaos Maturity Model
- Adoption
- Who Bought into Chaos Engineering
- How Much of the Organization Participates in Chaos Engineering
- Prerequisites
- Obstacles to Adoption
- Sophistication
- Game Days
- Fault injection consultation
- Fault injection self-service tools
- Experimentation platforms
- Automation of the platforms
- Putting It All Together
- Adoption
- V. Evolution
- 16. Continuous Verification
- Where CV Comes From
- Types of CV Systems
- CV in the Wild: ChAP
- ChAP: Selecting Experiments
- ChAP: Running Experiments
- The Advanced Principles in ChAP
- ChAP as Continuous Verification
- CV Coming Soon to a System Near You
- Performance Testing
- Data Artifacts
- Correctness
- 17. Lets Get Cyber-Physical
- The Rise of Cyber-Physical Systems
- Functional Safety Meets Chaos Engineering
- FMEA and Chaos Engineering
- Software in Cyber-Physical Systems
- Chaos Engineering as a Step Beyond FMEA
- Probe Effect
- Addressing the Probe Effect
- Conclusion
- 18. HOP Meets Chaos Engineering
- What Is Human and Organizational Performance (HOP)?
- Key Principles of HOP
- Principle 1: Error Is Normal
- Principle 2: Blame Fixes Nothing
- Principle 3: Context Drives Behavior
- Principle 4: Learning and Improving Is Vital
- Principle 5: Intentional Response Matters
- HOP Meets Chaos Engineering
- Chaos Engineering and HOP in Practice
- Conclusion
- 19. Chaos Engineering on a Database
- Why Do We Need Chaos Engineering?
- Robustness and Stability
- A Real-World Example
- Applying Chaos Engineering
- Our Way of Embracing Chaos
- Fault Injection
- Fault Injection in Applications
- Fault Injection in CPU and Memory
- Fault Injection in the Network
- Fault Injection in the Filesystem
- Detecting Failures
- Automating Chaos
- Automated Experimentation Platform: Schrodinger
- Schrodinger Workflow
- Conclusion
- Why Do We Need Chaos Engineering?
- 20. The Case for Security Chaos Engineering
- A Modern Approach to Security
- Human Factors and Failure
- Remove the Low-Hanging Fruit
- Feedback Loops
- Security Chaos Engineering and Current Methods
- Problems with Red Teaming
- Problems with Purple Teaming
- Benefits of Security Chaos Engineering
- Security Game Days
- Example Security Chaos Engineering Tool: ChaoSlingr
- The Story of ChaoSlingr
- Conclusion
- Contributors/Reviewers
- A Modern Approach to Security
- 21. Conclusion
- Index