Web Operations. Keeping the Data On Time - Helion
ISBN: 978-14-493-9415-8
stron: 338, Format: ebook
Data wydania: 2010-06-21
Księgarnia: Helion
Cena książki: 135,15 zł (poprzednio: 157,15 zł)
Oszczędzasz: 14% (-22,00 zł)
A web application involves many specialists, but it takes people in web ops to ensure that everything works together throughout an application's lifetime. It's the expertise you need when your start-up gets an unexpected spike in web traffic, or when a new feature causes your mature application to fail. In this collection of essays and interviews, web veterans such as Theo Schlossnagle, Baron Schwartz, and Alistair Croll offer insights into this evolving field. You'll learn stories from the trenches--from builders of some of the biggest sites on the Web--on what's necessary to help a site thrive.
- Learn the skills needed in web operations, and why they're gained through experience rather than schooling
- Understand why it's important to gather metrics from both your application and infrastructure
- Consider common approaches to database architectures and the pitfalls that come with increasing scale
- Learn how to handle the human side of outages and degradations
- Find out how one company avoided disaster after a huge traffic deluge
- Discover what went wrong after a problem occurs, and how to prevent it from happening again
Contributors include:
John Allspaw
Heather Champ
Michael Christian
Richard Cook
Alistair Croll
Patrick Debois
Eric Florenzano
Paul Hammond
Justin Huff
Adam Jacob
Jacob Loomis
Matt Massie
Brian Moon
Anoop Nagwani
Sean Power
Eric Ries
Theo Schlossnagle
Baron Schwartz
Andrew Shafer
Osoby które kupowały "Web Operations. Keeping the Data On Time", wybierały także:
- Mastering MEAN Stack 88,72 zł, (67,43 zł -24%)
- Psychology of UX Design 88,72 zł, (67,43 zł -24%)
- Time Is Money. The Business Value of Web Performance 74,99 zł, (63,74 zł -15%)
- SVG Text Layout. Words as Art 94,98 zł, (80,73 zł -15%)
- Discussing Design. Improving Communication and Collaboration through Critique 94,98 zł, (80,73 zł -15%)
Spis treści
Web Operations. Keeping the Data On Time eBook -- spis treści
- Web Operations: Keeping the Data on Time
- SPECIAL OFFER: Upgrade this ebook with OReilly
- Foreword
- Preface
- How This Book Is Organized
- Who This Book Is For
- Conventions Used in This Book
- Using Code Examples
- How to Contact Us
- Safari Books Online
- Acknowledgments
- 1. Web Operations: The Career
- Why Does Web Operations Have It Tough?
- A Strong Background in Computing
- Practiced Decisiveness
- A Calm Disposition
- From Apprentice to Master
- Knowledge
- Tools
- Experience
- The organizational challenge of inexperience
- The concept of "senior operations"
- Discipline
- Conclusion
- Why Does Web Operations Have It Tough?
- 2. How Picnik Uses Cloud Computing: Lessons Learned
- Where the Cloud Fits (and Why!)
- Storage
- Hybrid Computing with EC2
- Where the Cloud Doesnt Fit (for Picnik)
- Conclusion
- Where the Cloud Fits (and Why!)
- 3. Infrastructure and Application Metrics
- Time Resolution and Retention Concerns
- Locality of Metrics Collection and Storage
- Layers of Metrics
- High-Level Business or Feature-Specific Metrics
- System- and Service-Level Metrics
- Providing Context for Anomaly Detection and Alerts
- Log Lines Are Metrics, Too
- Correlation with Change Management and Incident Timelines
- Making Metrics Available to Your Alerting Mechanisms
- Using Metrics to Guide Load-Feedback Mechanisms
- A Metrics Collection System, Illustrated: Ganglia
- Background
- A Quick Introduction to Ganglia
- The need to keep collection and aggregation costs low
- The need to automatically discover new nodes and metrics
- The need to match network transport with your metrics collection task
- The need to implicitly prioritize cluster metrics
- The need to aggregate and organize metrics once they're collected
- The need to provide convenient interfaces for creating new metrics and pulling out existing metrics for correlation against other data
- Conclusion
- 4. Continuous Deployment
- Small Batches Mean Faster Feedback
- Small Batches Mean Problems Are Instantly Localized
- Small Batches Reduce Risk
- Small Batches Reduce Overhead
- The Quality Defenders' Lament
- Why Does It Work?
- Getting Started
- Step 1: Continuous Integration Server
- Step 2: Source Control Commit Check
- Step 3: Simple Deployment Script
- Step 4: Real-Time Alerting
- Step 5: Root-Cause Analysis (Five Whys)
- Continuous Deployment Is for Mission-Critical Applications
- Another Release? Do I Have To?
- The QA Dilemma
- Conclusion
- 5. Infrastructure As Code
- Service-Oriented Architecture
- Configuration Management
- Configuration management is policy driven
- System automation is configuration management policy made into code
- Configuration management in system administration
- System Integration
- Step 1: Break the infrastructure down into reusable, network-accessible services
- The bootstrapping service.
- The configuration service.
- Step 2: Integrate the services together
- Step 1: Break the infrastructure down into reusable, network-accessible services
- Configuration Management
- Conclusion
- Service-Oriented Architecture
- 6. Monitoring
- Story: "The Start of a Journey"
- Step 1: Understand What You Are Monitoring
- Step 2: Understand Normal Behavior
- Step 3: Be Prepared and Learn
- Conclusion
- 7. How Complex Systems Fail
- How Complex Systems Fail
- (Being a Short Treatise on the Nature of Failure; How Failure Is Evaluated; How Failure Is Attributed to Proximate Cause; and the Resulting New Understanding of Patient Safety)
- Complex systems are intrinsically hazardous systems
- Complex systems are heavily and successfully defended against failure
- Catastrophe requires multiple failuressingle-point failures are not enough
- Complex systems contain changing mixtures of failures latent within them
- Complex systems run in degraded mode
- Catastrophe is always just around the corner
- Post-accident attribution to a "root cause" is fundamentally wrong
- Hindsight biases post-accident assessments of human performance
- Human operators have dual roles: as producers and as defenders against failure
- All practitioner actions are gambles
- Actions at the sharp end resolve all ambiguity
- Human practitioners are the adaptable element of complex systems
- Human expertise in complex systems is constantly changing
- Change introduces new forms of failure
- Views of "cause" limit the effectiveness of defenses against future events
- Safety is a characteristic of systems and not of their components
- People continuously create safety
- Failure-free operations require experience with failure
- As It Pertains Specifically to Web Operations
- It will be difficult to tell that the system has failed
- It will be difficult to tell what has failed
- Meaningful response will be delayed
- Communications will be strained and tempers will flare
- Maintenance will be a major source of new failures
- Recovery from backup is itself difficult and potentially dangerous
- Create test procedures that front-line people can use to verify system status
- Manage operations on a daily basis
- Control maintenance
- Assess performance at regular intervals
- Be a (unique) customer
- (Being a Short Treatise on the Nature of Failure; How Failure Is Evaluated; How Failure Is Attributed to Proximate Cause; and the Resulting New Understanding of Patient Safety)
- Further Reading
- How Complex Systems Fail
- 8. Community Management and Web Operations
- 9. Dealing with Unexpected Traffic Spikes
- How It All Started
- Alarms Abound
- Putting Out the Fire
- Surviving the Weekend
- Preparing for the Future
- CDN to the Rescue
- Proxy Servers
- Corralling the Stampede
- Streamlining the Codebase
- How Do We Know It Works?
- The Real Test
- Lessons Learned
- Improvements Since Then
- 10. Dev and Ops Collaboration and Cooperation
- Deployment
- Shared, Open Infrastructure
- Trust
- On-call Developers
- Live Debugging Tools
- Feature Flags
- Avoiding Blame
- Conclusion
- 11. How Your Visitors Feel: User-Facing Metrics
- Why Collect User-Facing Metrics?
- Successful Start-ups Learn and Adapt
- Performance Matters
- Recent Research Quantifies the Relationship
- What Makes a Site Slow?
- Service Discovery
- Sending the Request
- Thinking About the Response
- Delivering the Response
- Asynchronous Traffic and Refresh
- Rendering Time
- Measuring Delay
- Synthetic Monitoring
- When to use synthetic monitoring
- Limitations of synthetic monitoring
- Configuring synthetic monitoring
- Real User Monitoring
- When to use RUM
- Limitations of RUM
- Configuring RUM
- Synthetic Monitoring
- Building an SLA
- Apdex
- Visitor Outcomes: Analytics
- How Marketing Defines Success
- The Four Kinds of Sites
- A (Very) Basic Model of Analytics
- Correlating Performance and Analytics by Time
- Correlating Performance and Analytics by Visits
- Other Metrics Marketing Cares About
- Web Interaction Analytics
- Voice of the Customer
- How User Experience Affects Web Ops
- Many More Stakeholders
- Monitoring As Part of the Life Cycle, Not Just QA
- The Future of Web Monitoring
- Moving from Parts to Users
- Service-Centric Architectures
- Clouds and Monitoring
- APIs and RSS Feeds
- Delivering an API to others
- Consuming an API from someone else
- Rich Internet Applications
- HTML5: Server-Sent Events and WebSockets
- Online Communities and the Long Funnel
- Tying Together Mail and Conversion Loops
- The Capacity/Cost/Revenue Equation
- Conclusion
- Why Collect User-Facing Metrics?
- 12. Relational Database Strategy and Tactics for the Web
- Requirements for Web Databases
- Always On
- Mostly Transactional Workload
- Simple Data, Simple Queries
- Availability Trumps Consistency
- Rapid Development
- Online Deployment
- Built by Developers
- How Typical Web Databases Grow
- Single Server
- Master and Replication Slaves
- Functional Partitioning
- Sharding, or Horizontal Partitioning
- Caching Layer
- The Yearning for a Cluster
- The CAP Theorem and ACID Versus BASE
- State of MySQL Clustering
- DRBD and Heartbeat
- Master-Master Replication Manager (MMM)
- Heartbeat with replication
- Proxy-based solutions
- InfiniDB, Galera, Tungsten, and ScaleDB
- Summary
- Database Strategy
- Architecture Requirements
- Easy wins
- Safe-Bet Architectures
- Risky Architectures
- Sharding
- Writing to more than one master
- Multilevel replication
- Ring replication (beyond two nodes)
- Reliance on DNS
- The so-called Entity-Attribute-Value (EAV) design pattern
- Architecture Requirements
- Database Tactics
- Taking Backups on a Slave
- Online Schema Changes
- Monitoring, Graphing, and Instrumentation
- Analyzing Performance
- Archiving and Purging Data
- Conclusion
- Requirements for Web Databases
- 13. How to Make Failure Beautiful: The Art and Science of Postmortems
- The Worst Postmortem
- What Is a Postmortem?
- When to Conduct a Postmortem
- Who to Invite to a Postmortem
- Running a Postmortem
- Postmortem Follow-Up
- Conclusion
- 14. Storage
- Data Asset Inventory
- Data Protection
- Capacity Planning
- Storage Sizing
- Operations
- Conclusion
- 15. Nonrelational Databases
- NoSQL Database Overview
- Pure Key/Value
- Data Structure
- Graph
- Document Oriented
- Highly Distributed
- Some Systems in Detail
- Cassandra
- HBase
- Riak
- CouchDB
- MongoDB
- Redis
- Conclusion
- NoSQL Database Overview
- 16. Agile Infrastructure
- Agile Infrastructure
- But Agile Is Not the Only Thing That Has Evolved
- Some People Are Born to Web Operations, Some People Have Web Operations Thrust upon Them...
- Working Software Is the Primary Measure of Progress
- The Application Is the Infrastructure, the Infrastructure Is the Application
- So, What's the Problem?
- Talk Does Not Cook Rice
- The infrastructure is an application
- Version control: The foundation of sanity
- Configuration management and automated deployments
- Monitoring
- Dev-test-prod life cycle, continuous integration, and disaster recovery
- Radiate information
- Reflective process improvement
- Incremental changes and refactoring
- The simplest thing that could work
- Separation of concerns
- Technical debt
- Continuous deployment
- Pairing
- Managing flow
- Talk Does Not Cook Rice
- Communities of Interest and Practice
- Trading Zones and Apologies
- What to Do?
- Conclusion
- Agile Infrastructure
- 17. Things That Go Bump in the Night (and How to Sleep Through Them)
- Definitions
- How Many 9s?
- Impact Duration Versus Incident Duration
- Datacenter Footprint
- Gradual Failures
- Trust Nobody
- Failover Testing
- Monitoring and History of Patterns
- Getting a Good Night's Sleep
- A. Contributors
- Index
- About the Authors
- Colophon
- SPECIAL OFFER: Upgrade this ebook with OReilly