Architecting Data and Machine Learning Platforms - Helion

ebook

Autor: Marco Tranquillin, Valliappa Lakshmanan, Firat Tekiner
ISBN: 9781098151577
stron: 362, Format: ebook
Data wydania: 2023-10-12
Księgarnia: Helion

Cena książki: 220,15 zł (poprzednio: 255,99 zł)
Oszczędzasz: 14% (-35,84 zł)

Osoby, które kupiły tę książkę, wybierały także »

All cloud architects need to know how to build data platforms that enable businesses to make data-driven decisions and deliver enterprise-wide intelligence in a fast and efficient way. This handbook shows you how to design, build, and modernize cloud native data and machine learning platforms using AWS, Azure, Google Cloud, and multicloud tools like Snowflake and Databricks.

Authors Marco Tranquillin, Valliappa Lakshmanan, and Firat Tekiner cover the entire data lifecycle from ingestion to activation in a cloud environment using real-world enterprise architectures. You'll learn how to transform, secure, and modernize familiar solutions like data warehouses and data lakes, and you'll be able to leverage recent AI/ML patterns to get accurate and quicker insights to drive competitive advantage.

You'll learn how to:

Design a modern and secure cloud native or hybrid data analytics and machine learning platform
Accelerate data-led innovation by consolidating enterprise data in a governed, scalable, and resilient data platform
Democratize access to enterprise data and govern how business teams extract insights and build AI/ML capabilities
Enable your business to make decisions in real time using streaming pipelines
Build an MLOps platform to move to a predictive and prescriptive analytics approach

Osoby które kupowały "Architecting Data and Machine Learning Platforms", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 117,27 zł, (12,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
React.js i Node.js. Kurs video. Budowanie serwisu w oparciu o popularne biblioteki języka JavaScript 128,46 zł, (16,70 zł -87%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,14 zł, (12,90 zł -86%)

Spis treści

Architecting Data and Machine Learning Platforms eBook -- spis treści

Preface
- Why Do You Need a Cloud Data Platform?
- Who Is This Book For?
- Organization of This Book
- Conventions Used in This Book
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
1. Modernizing Your Data Platform: An Introductory Overview
- The Data Lifecycle
  - The Journey to Wisdom
  - Water Pipes Analogy
  - Collect
  - Store
    - Scalability
    - Performance versus cost
    - High availability
    - Durability
    - Openness
  - Process/Transform
  - Analyze/Visualize
  - Activate
- Limitations of Traditional Approaches
  - Antipattern: Breaking Down Silos Through ETL
  - Antipattern: Centralization of Control
  - Antipattern: Data Marts and Hadoop
- Creating a Unified Analytics Platform
  - Cloud Instead of On-Premises
  - Drawbacks of Data Marts and Data Lakes
  - Convergence of DWHs and Data Lakes
    - Lakehouse
    - Data mesh
- Hybrid Cloud
  - Reasons Why Hybrid Is Necessary
  - Challenges of Hybrid Cloud
  - Why Hybrid Can Work
  - Edge Computing
- Applying AI
  - Machine Learning
  - Uses of ML
- Why Cloud for AI?
  - Cloud Infrastructure
  - Democratization
  - Real Time
  - MLOps
- Core Principles
- Summary
2. Strategic Steps to Innovate with Data
- Step 1: Strategy and Planning
  - Strategic Goals
  - Identify Stakeholders
  - Change Management
- Step 2: Reduce Total Cost of Ownership by Adopting a Cloud Approach
  - Why Cloud Costs Less
  - How Much Are the Savings?
  - When Does Cloud Help?
- Step 3: Break Down Silos
  - Unifying Data Access
  - Choosing Storage
  - Semantic Layer
- Step 4: Make Decisions in Context Faster
  - Batch to Stream
  - Contextual Information
  - Cost Management
- Step 5: Leapfrog with Packaged AI Solutions
  - Predictive Analytics
  - Understanding and Generating Unstructured Data
  - Personalization
  - Packaged Solutions
- Step 6: Operationalize AI-Driven Workflows
  - Identifying the Right Balance of Automation and Assistance
  - Building a Data Culture
  - Populating Your Data Science Team
- Step 7: Product Management for Data
  - Applying Product Management Principles to Data
  - 1. Understand and Maintain a Map of Data Flows in the Enterprise
  - 2. Identify Key Metrics
  - 3. Agreed Criteria, Committed Roadmap, and Visionary Backlog
  - 4. Build for the Customers You Have
  - 5. Dont Shift the Burden of Change Management
  - 6. Interview Customers to Discover Their Data Needs
  - 7. Whiteboard and Prototype Extensively
  - 8. Build Only What Will Be Used Immediately
  - 9. Standardize Common Entities and KPIs
  - 10. Provide Self-Service Capabilities in Your Data Platform
- Summary
3. Designing Your Data Team
- Classifying Data Processing Organizations
- Data AnalysisDriven Organization
  - The Vision
  - The Personas
    - Data analysts
    - Business analysts
    - Data engineers
  - The Technological Framework
- Data EngineeringDriven Organization
  - The Vision
  - The Personas
    - Knowledge
    - Responsibilities
  - The Technological Framework
    - Reference architectures
    - Benefits of the reference architecture
- Data ScienceDriven Organization
  - The Vision
  - The Personas
  - The Technological Framework
- Summary
4. A Migration Framework
- Modernize Data Workflows
  - Holistic View
  - Modernize Workflows
  - Transform the Workflow Itself
- A Four-Step Migration Framework
  - Prepare and Discover
  - Assess and Plan
  - Execute
    - Landing zone
    - Migrate
    - Validate
  - Optimize
- Estimating the Overall Cost of the Solution
  - Audit of the Existing Infrastructure
  - Request for Information/Proposal and Quotation
  - Proof of Concept/Minimum Viable Product
- Setting Up Security and Data Governance
  - Framework
  - Artifacts
  - Governance over the Life of the Data
- Schema, Pipeline, and Data Migration
  - Schema Migration
  - Pipeline Migration
  - Data Migration
    - Planning
    - Regional capacity and network to the cloud
    - Transfer options
  - Migration Stages
- Summary
5. Architecting a Data Lake
- Data Lake and the CloudA Perfect Marriage
  - Challenges with On-Premises Data Lakes
  - Benefits of Cloud Data Lakes
- Design and Implementation
  - Batch and Stream
  - Data Catalog
  - Hadoop Landscape
  - Cloud Data Lake Reference Architecture
    - Amazon Web Services
    - Microsoft Azure
    - Google Cloud Platform
- Integrating the Data Lake: The Real Superpower
  - APIs to Extend the Lake
  - The Evolution of Data Lake with Apache Iceberg, Apache Hudi, and Delta Lake
  - Interactive Analytics with Notebooks
- Democratizing Data Processing and Reporting
  - Build Trust in the Data
  - Data Ingestion Is Still an IT Matter
- ML in the Data Lake
  - Training on Raw Data
  - Predicting in the Data Lake
- Summary
6. Innovating with an Enterprise Data Warehouse
- A Modern Data Platform
  - Organizational Goals
  - Technological Challenges
  - Technology Trends and Tools
- Hub-and-Spoke Architecture
  - Data Ingest
    - Prebuilt connectors
    - Real-time data
    - Federated data
  - Business Intelligence
    - SQL analytics
    - Visualization
    - Embedded analytics
    - Semantic layer
  - Transformations
    - ELT with views
    - Scheduled queries
    - Materialized views
    - Security and lineage
  - Organizational Structure
- DWH to Enable Data Scientists
  - Query Interface
  - Storage API
  - ML Without Moving Your Data
    - Training ML models
    - ML training and serving
    - Exporting trained ML models
    - Using your trained model in ML pipelines
    - Invoking external ML models
    - Loading pretrained ML models
- Summary
7. Converging to a Lakehouse
- The Need for a Unique Architecture
  - User Personas
  - Antipattern: Disconnected Systems
  - Antipattern: Duplicated Data
- Converged Architecture
  - Two Forms
    - Choose based on user skills
    - Complete evaluation criteria
  - Lakehouse on Cloud Storage
    - Reference architecture
    - Migration
    - Future proofing
  - SQL-First Lakehouse
    - Reference architecture
    - Migration
    - Future proofing
  - The Benefits of Convergence
- Summary
8. Architectures for Streaming
- The Value of Streaming
  - Industry Use Cases
  - Streaming Use Cases
- Streaming Ingest
  - Streaming ETL
  - Streaming ELT
  - Streaming Insert
  - Streaming from Edge Devices (IoT)
  - Streaming Sinks
- Real-Time Dashboards
  - Live Querying
  - Materialize Some Views
- Stream Analytics
  - Time-Series Analytics
  - Clickstream Analytics
  - Anomaly Detection
  - Resilient Streaming
- Continuous Intelligence Through ML
  - Training Model on Streaming Data
    - Windowed training
    - Scheduled training
    - Continuous evaluation and retraining
  - Streaming ML Inference
  - Automated Actions
- Summary
9. Extending a Data Platform Using Hybrid and Edge
- Why Multicloud?
  - A Single Cloud Is Simpler and Cost-Effective
  - Multicloud Is Inevitable
  - Multicloud Could Be Strategic
- Multicloud Architectural Patterns
  - Single Pane of Glass
  - Write Once, Run Anywhere
  - Bursting from On Premises to Cloud
  - Pass-Through from On Premises to Cloud
  - Data Integration Through Streaming
- Adopting Multicloud
  - Framework
  - Time Scale
  - Define a Target Multicloud Architecture
- Why Edge Computing?
  - Bandwidth, Latency, and Patchy Connectivity
  - Use Cases
  - Benefits
  - Challenges
- Edge Computing Architectural Patterns
  - Smart Devices
  - Smart Gateways
  - ML Activation
- Adopting Edge Computing
  - The Initial Context
  - The Project
    - Improve overall system observability
    - Develop automations
    - Optimize the maintenance
  - The Final Outcomes and Next Steps
- Summary
10. AI Application Architecture
- Is This an AI/ML Problem?
  - Subfields of AI
  - Generative AI
    - How it works
    - Strengths and limitations
    - Do LLMs memorize or generalize?
    - LLMs hallucinate
    - Human feedback is needed
    - Weaknesses
    - Use cases
  - Problems Fit for ML
- Buy, Adapt, or Build?
  - Data Considerations
  - When to Buy
  - What Can You Buy?
  - How Adapting Works
- AI Architectures
  - Understanding Unstructured Data
  - Generating Unstructured Data
  - Predicting Outcomes
  - Forecasting Values
  - Anomaly Detection
  - Personalization
  - Automation
- Responsible AI
  - AI Principles
  - ML Fairness
  - Explainability
- Summary
11. Architecting an ML Platform
- ML Activities
- Developing ML Models
  - Labeling Environment
  - Development Environment
  - User Environment
  - Preparing Data
  - Training ML Models
    - Writing ML code
    - Small-scale jobs
    - Distributed training
    - No-code ML
- Deploying ML Models
  - Deploying to an Endpoint
  - Evaluate Model
  - Hybrid and Multicloud
  - Training-Serving Skew
    - Within the model
    - Transform function
    - Feature store
    - The canonical use of a feature store
    - Decision chart
- Automation
  - Automate Training and Deployment
  - Orchestration with Pipelines
    - Managed pipelines
    - Airflow
    - Kubeflow Pipelines
    - TensorFlow Extended
  - Continuous Evaluation and Training
    - Artifacts
    - Dependency tracking
    - Continuous evaluation
    - Continuous retraining
- Choosing the ML Framework
  - Team Skills
  - Task Considerations
  - User-Centric
- Summary
12. Data Platform Modernization: A Model Case
- New Technology for a New Era
  - The Need for Change
  - It Is Not Only a Matter of Technology
- The Beginning of the Journey
  - The Current Environment
  - The Target Environment
  - The PoC Use Case
- The RFP Responses Proposed by Cloud Vendors
  - The Target Environment
  - The Approach on Migration
    - Foundations development
    - Quick wins migration
    - Migration fulfillment
    - Modernization
- The RFP Evaluation Process
  - The Scope of the PoC
  - The Execution of the PoC
  - The Final Decision
- Peroration
- Summary
Index