Reliable Machine Learning - Helion

ebook

Autor: Cathy Chen, Niall Richard Murphy, Kranti Parisa
ISBN: 9781098106171
stron: 410, Format: ebook
Data wydania: 2021-10-12
Księgarnia: Helion

Cena książki: 254,15 zł (poprzednio: 299,00 zł)
Oszczędzasz: 15% (-44,85 zł)

Osoby, które kupiły tę książkę, wybierały także »

Whether you're part of a small startup or a multinational corporation, this practical book shows data scientists, software and site reliability engineers, product managers, and business owners how to run and establish ML reliably, effectively, and accountably within your organization. You'll gain insight into everything from how to do model monitoring in production to how to run a well-tuned model development team in a product organization.

By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guest authors show you how to run an efficient and reliable ML system. Whether you want to increase revenue, optimize decision making, solve problems, or understand and influence customer behavior, you'll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind.

You'll examine:

What ML is: how it functions and what it relies on
Conceptual frameworks for understanding how ML "loops" work
How effective productionization can make your ML systems easily monitorable, deployable, and operable
Why ML systems make production troubleshooting more difficult, and how to compensate accordingly
How ML, product, and production teams can communicate effectively

Osoby które kupowały "Reliable Machine Learning", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)

Spis treści

Reliable Machine Learning eBook -- spis treści

Foreword
Preface
- Why We Wrote This Book
- SRE as the Lens on ML
- Intended Audience
- How This Book Is Organized
  - Our Approach
  - Lets Knit!
  - Navigating This Book
- About the Authors
- Conventions Used in This Book
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
  - Cathy Chen
  - Niall Richard Murphy
  - Kranti Parisa
  - D. Sculley
  - Todd Underwood
1. Introduction
- The ML Lifecycle
  - Data Collection and Analysis
  - ML Training Pipelines
  - Build and Validate Applications
  - Quality and Performance Evaluation
  - Defining and Measuring SLOs
  - Launch
    - Models as code
    - Launch slowly
    - Release, not refactor
    - Isolate rollouts at the data layer
    - Measure SLOs during launch
    - Review the rollout
  - Monitoring and Feedback Loops
- Lessons from the Loop
2. Data Management Principles
- Data as Liability
- The Data Sensitivity of ML Pipelines
- Phases of Data
  - Creation
  - Ingestion
  - Processing
    - Validation
    - Cleaning and ensuring data consistency
    - Enriching and extending
  - Storage
  - Management
  - Analysis and Visualization
- Data Reliability
  - Durability
  - Consistency
  - Version Control
  - Performance
  - Availability
- Data Integrity
  - Security
  - Privacy
  - Policy and Compliance
    - Jurisdictional rules
    - Reporting requirements
- Conclusion
3. Basic Introduction to Models
- What Is a Model?
- A Basic Model Creation Workflow
- Model Architecture Versus Model Definition Versus Trained Model
- Where Are the Vulnerabilities?
  - Training Data
    - Incomplete coverage
    - Spurious correlations
    - Cold start
    - Self-fulfilling prophecies and ML echo chambers
    - Changes in the world
  - Labels
    - Label noise
    - Wrong label objective
    - Fraud or malicious feedback
  - Training Methods
    - Overfitting
    - Lack of stability
    - Peculiarities of deep learning
- Infrastructure and Pipelines
  - Platforms
  - Feature Generation
  - Upgrades and Fixes
- A Set of Useful Questions to Ask About Any Model
- An Example ML System
  - Yarn Product Click-Prediction Model
  - Features
  - Labels for Features
  - Model Updating
  - Model Serving
  - Common Failures
- Conclusion
4. Feature and Training Data
- Features
  - Feature Selection and Engineering
  - Lifecycle of a Feature
  - Feature Systems
    - Data ingestion system
    - Feature store
    - Feature quality evaluation system
- Labels
- Human-Generated Labels
  - Annotation Workforces
  - Measuring Human Annotation Quality
  - An Annotation Platform
  - Active Learning and AI-Assisted Labeling
  - Documentation and Training for Labelers
- Metadata
  - Metadata Systems Overview
  - Dataset Metadata
  - Feature Metadata
  - Label Metadata
  - Pipeline Metadata
- Data Privacy and Fairness
  - Privacy
    - PII data and features
    - Private data and labeling
  - Fairness
- Conclusion
5. Evaluating Model Validity and Quality
- Evaluating Model Validity
- Evaluating Model Quality
  - Offline Evaluations
  - Evaluation Distributions
    - Held-out test data
    - Progressive validation
    - Golden sets
    - Stress-test distributions
    - Sliced analysis
    - Counterfactual testing
  - A Few Useful Metrics
    - Canary metrics
      - Bias
      - Calibration
    - Classification metrics
      - Accuracy
      - Precision and recall
      - AUC ROC
      - Precision/recall curves
    - Regression metrics
      - Mean squared error and mean absolute error
      - Log loss
- Operationalizing Verification and Evaluation
- Conclusion
6. Fairness, Privacy, and Ethical ML Systems
- Fairness (a.k.a. Fighting Bias)
  - Definitions of Fairness
  - Reaching Fairness
  - Fairness as a Process Rather than an Endpoint
  - A Quick Legal Note
- Privacy
  - Methods to Preserve Privacy
    - Technical measures
    - Institutional measures
  - A Quick Legal Note
- Responsible AI
  - Explanation
  - Effectiveness
  - Social and Cultural Appropriateness
- Responsible AI Along the ML Pipeline
  - Use Case Brainstorming
  - Data Collection and Cleaning
  - Model Creation and Training
  - Model Validation and Quality Assessment
  - Model Deployment
  - Products for the Market
- Conclusion
7. Training Systems
- Requirements
- Basic Training System Implementation
  - Features
  - Feature Store
  - Model Management System
  - Orchestration
    - Job/process/resource scheduling system
    - ML framework
  - Quality Evaluation
  - Monitoring
- General Reliability Principles
  - Most Failures Will Not Be ML Failures
  - Models Will Be Retrained
  - Models Will Have Multiple Versions (at the Same Time!)
  - Good Models Will Become Bad
  - Data Will Be Unavailable
  - Models Should Be Improvable
  - Features Will Be Added and Changed
  - Models Can Train Too Fast
  - Resource Utilization Matters
  - Utilization != Efficiency
  - Outages Include Recovery
- Common Training Reliability Problems
  - Data Sensitivity
  - Example Data Problem at YarnIt
  - Reproducibility
  - Example Reproducibility Problem at YarnIt
  - Compute Resource Capacity
  - Example Capacity Problem at YarnIt
- Structural Reliability
  - Organizational Challenges
  - Ethics and Fairness Considerations
- Conclusion
8. Serving
- Key Questions for Model Serving
  - What Will Be the Load to Our Model?
  - What Are the Prediction Latency Needs of Our Model?
  - Where Does the Model Need to Live?
    - On a local machine
    - On servers owned or managed by our organization
    - In the cloud
    - On-device
  - What Are the Hardware Needs for Our Model?
  - How Will the Serving Model Be Stored, Loaded, Versioned, and Updated?
  - What Will Our Feature Pipeline for Serving Look Like?
- Model Serving Architectures
  - Offline Serving (Batch Inference)
    - Advantages
    - Disadvantages
  - Online Serving (Online Inference)
    - Advantages
    - Disadvantages
  - Model as a Service
    - Advantages
    - Disadvantages
  - Serving at the Edge
    - Advantages
    - Disadvantages
  - Choosing an Architecture
- Model API Design
- Testing
- Serving for Accuracy or Resilience?
- Scaling
  - Autoscaling
  - Caching
- Disaster Recovery
- Ethics and Fairness Considerations
- Conclusion
9. Monitoring and Observability for Models
- What Is Production Monitoring and Why Do It?
  - What Does It Look Like?
  - The Concerns That ML Brings to Monitoring
  - Reasons for Continual ML Observabilityin Production
- Problems with ML Production Monitoring
  - Difficulties of Development Versus Serving
  - A Mindset Change Is Required
- Best Practices for ML Model Monitoring
  - Generic Pre-serving Model Recommendations
    - Explainability and monitoring
  - Training and Retraining
    - Concrete recommendations
  - Model Validation (Before Rollout)
    - Fallbacks in validation
    - Call to action
    - Concrete recommendations
  - Serving
    - Model
      - Case 1: Real-time actuals
      - Case 2: Delayed actuals
      - Case 3: Biased actuals
      - Case 4: No/few actuals
      - Other approaches
      - Troubleshooting model performance metrics
    - Data
      - Drift
      - Measuring drift
      - Troubleshooting drift
    - Data quality
      - Categorical data
      - Numerical data
      - Measuring data quality
    - Service
      - Optimizing performance of the model
      - Optimizing performance of the service
    - Other Things to Consider
      - SLOs in ML monitoring
      - Monitoring across services
      - Fairness in monitoring
      - Privacy in monitoring
      - Business impact
      - Dense data types (image, video, text documents, audio, and so on)
    - High-Level Recommendations for Monitoring Strategy
- Conclusion
10. Continuous ML
- Anatomy of a Continuous ML System
  - Training Examples
  - Training Labels
  - Filtering Out Bad Data
  - Feature Stores and Data Management
  - Updating the Model
  - Pushing Updated Models to Serving
- Observations About Continuous ML Systems
  - External World Events May Influence Our Systems
  - Models Can Influence Their Own Training Data
  - Temporal Effects Can Arise at Several Timescales
  - Emergency Response Must Be Done in Real Time
    - Stop training
    - Fall back
    - Roll back
    - Remove bad data
    - Roll through
    - Choosing a response strategy
    - Organizational considerations
  - New Launches Require Staged Ramp-ups and Stable Baselines
  - Models Must Be Managed Rather Than Shipped
- Continuous Organizations
- Rethinking Noncontinuous ML Systems
- Conclusion
11. Incident Response
- Incident Management Basics
  - Life of an Incident
  - Incident Response Roles
- Anatomy of an ML-Centric Outage
- Terminology Reminder: Model
- Story Time
  - Story 1: Searching but Not Finding
    - Stages of ML incident response for story 1
  - Story 2: Suddenly Useless Partners
    - Stages of ML incident response for story 2
  - Story 3: Recommend You Find New Suppliers
    - Stages of ML incident response for story 3
- ML Incident Management Principles
  - Guiding Principles
  - Model Developer or Data Scientist
    - Preparation
    - Incident handling
    - Continuous improvement
  - Software Engineer
    - Preparation
    - Incident handling
    - Continuous improvement
  - ML SRE or Production Engineer
    - Preparation
    - Incident handling
    - Continuous improvement
  - Product Manager or Business Leader
    - Preparation
    - Incident handling
    - Continuous improvement
- Special Topics
  - Production Engineers and ML Engineering Versus Modeling
  - The Ethical On-Call Engineer Manifesto
    - Impact
    - Cause
    - Troubleshooting
    - Solutions and a call to action
- Conclusion
12. How Product and ML Interact
- Different Types of Products
- Agile ML?
- ML Product Development Phases
  - Discovery and Definition
  - Business Goal Setting
  - MVP Construction and Validation
  - Model and Product Development
  - Deployment
  - Support and Maintenance
- Build Versus Buy
  - Models
    - Generic use cases
    - Companys data initiatives
  - Data Processing Infrastructure
  - End-to-End Platforms
  - Scoring Approach for Making the Decision
  - Making the Decision
- Sample YarnIt Store Features Powered by ML
  - Showcasing Popular Yarns by Total Sales
  - Recommendations Based on Browsing History
  - Cross-selling and Upselling
  - Content-Based Filtering
  - Collaborative Filtering
- Conclusion
13. Integrating ML into Your Organization
- Chapter Assumptions
  - Leader-Based Viewpoint
  - Detail Matters
  - ML Needs to Know About the Business
  - The Most Important Assumption You Make
  - The Value of ML
- Significant Organizational Risks
  - ML Is Not Magic
  - Mental (Way of Thinking) Model Inertia
  - Surfacing Risk Correctly in Different Cultures
  - Siloed Teams Dont Solve All Problems
- Implementation Models
  - Remembering the Goal
  - Greenfield Versus Brownfield
  - ML Roles and Responsibilities
  - How to Hire ML Folks
- Organizational Design and Incentives
  - Strategy
  - Structure
  - Processes
  - Rewards
  - People
  - A Note on Sequencing
- Conclusion
14. Practical ML Org Implementation Examples
- Scenario 1: A New Centralized ML Team
  - Background and Organizational Description
  - Process
  - Rewards
  - People
  - Default Implementation
- Scenario 2: Decentralized ML Infrastructure and Expertise
  - Background and Organizational Description
  - Process
  - Rewards
  - People
  - Default Implementation
- Scenario 3: Hybrid with Centralized Infrastructure/Decentralized Modeling
  - Background and Organizational Description
  - Process
  - Rewards
  - People
  - Default Implementation
- Conclusion
15. Case Studies: MLOps in Practice
- 1. Accommodating Privacy and Data Retention Policies in ML Pipelines
  - Background
  - Problem and Resolution
    - Challenge 1: Which dialects?
    - Solution: Get rid of the concept of dialects!
    - Challenge 2: Racing the clock
    - Solutions (and new challenges!)
  - Takeaways
- 2. Continuous ML Model Impacting Traffic
  - Background
  - Problem and Resolution
  - Takeaways
- 3. Steel Inspection
  - Background
  - Problem and Resolution
  - Takeaways
- 4. NLP MLOps: Profiling and Staging Load Test
  - Background
  - Problem and Resolution
    - An improved process for benchmarking
  - Takeaways
- 5. Ad Click Prediction: Databases Versus Reality
  - Background
  - Problem and Resolution
  - Takeaways
- 6. Testing and Measuring Dependencies in ML Workflow
  - Background
  - Problem and Resolution
    - Building the regression-testing sandbox
    - Monitoring for regression
  - Takeaways
Index