reklama - zainteresowany?

Practical Lakehouse Architecture - Helion

Practical Lakehouse Architecture
ebook
Autor: Gaurav Ashok Thalpati
ISBN: 9781098152970
stron: 286, Format: ebook
Data wydania: 2024-07-24
Księgarnia: Helion

Cena książki: 183,08 zł (poprzednio: 247,41 zł)
Oszczędzasz: 26% (-64,33 zł)

Dodaj do koszyka Practical Lakehouse Architecture

Tagi: Analiza danych

This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures.

Practical Lakehouse Architecture shows you how to:

  • Understand key lakehouse concepts and features like transaction support, time travel, and schema evolution
  • Understand the differences between traditional and lakehouse data architectures
  • Differentiate between various file formats and table formats
  • Design lakehouse architecture layers for storage, compute, metadata management, and data consumption
  • Implement data governance and data security within the platform
  • Evaluate technologies and decide on the best technology stack to implement the lakehouse for your use case
  • Make critical design decisions and address practical challenges to build a future-ready data platform
  • Start your lakehouse implementation journey and migrate data from existing systems to the lakehouse

Dodaj do koszyka Practical Lakehouse Architecture

 

Osoby które kupowały "Practical Lakehouse Architecture", wybierały także:

  • Data Science w Pythonie. Kurs video. Przetwarzanie i analiza danych
  • Excel 2013. Kurs video. Poziom drugi. Przetwarzanie i analiza danych
  • Zarz
  • Eksploracja danych za pomoc
  • Google Analytics od podstaw. Analiza wp

Dodaj do koszyka Practical Lakehouse Architecture

Spis treści

Practical Lakehouse Architecture eBook -- spis treści

  • Preface
    • Who Should Read This Book?
    • Why I Wrote This Book
    • Navigating This Book
    • OReilly Online Learning
    • Conventions Used in This Book
    • How to Contact Us
    • Acknowledgments
  • 1. Introduction to Lakehouse Architecture
    • Understanding Data Architecture
      • What Is Data Architecture?
      • How Does Data Architecture Help Build a Data Platform?
        • Defining core components
        • Defining component interdependencies and data flow
        • Defining guiding principles
        • Defining the technology stack
        • Aligning with overall vision and data strategy
      • Core Components of a Data Platform
        • Source systems
          • Internal and external source systems
          • Batch, near real-time, and streaming systems
          • Structured, semi-structured, and unstructured data
        • Data ingestion
          • Batch ingestion
          • Near real-time
          • Streaming
        • Data storage
          • General storage
          • Purpose-built storage
        • Data processing and transformations
          • Data validation and cleansing
          • Data transformation
          • Data curation and serving
        • Data consumption and delivery
          • BI workloads
          • Ad hoc/Interactive analysis
          • Downstream applications and APIs
          • AI and ML workloads
        • Common services
          • Metadata management
          • Data governance and data security
          • Data operations
    • Why Do We Need a New Data Architecture?
    • Lakehouse Architecture: A New Pattern
      • The Lakehouse: Best of Both Worlds
        • How does a lakehouse get data lake features?
        • How does a lakehouse get data warehouse features?
      • Understanding Lakehouse Architecture
        • Storage layer
          • Cloud storage
          • Open file formats
          • Open table formats
        • Compute layer
          • Open-source engines
          • Commercial engines
      • Lakehouse Architecture Characteristics
        • Single storage tier with no dedicated warehouse
        • Warehouse-like performance on the data lake
        • Decoupled architecture with separate storage and compute scaling
        • Open architecture
        • Support for different data types
        • Support for diverse workloads
      • Lakehouse Architecture Benefits
        • Simplified architecture
        • Support for unstructured data and ML use cases
        • No vendor lock-ins
        • Data sharing
        • Scalable and cost efficient
        • No data swamps
        • Schema enforcement and evolution
          • Schema enforcement
          • Schema evolution
        • Unified platform for ETL/ELT, BI, AI/ML, and real-time workloads
          • ETL/ELT workloads
          • BI workloads
          • AI/ML workloads
          • Real-time workloads
        • Time travel
          • Retrieve older data based on version
          • Retrieve older data based on timestamp
    • Key Takeaways
    • References
  • 2. Traditional Architectures and Modern Data Platforms
    • Traditional Architectures: Data Lakes and Data Warehouses
      • Data Warehouse Fundamentals
        • Benefits and advantages
        • Limitations and challenges
      • Data Lake Fundamentals
        • Benefits and advantages
        • Limitations and challenges
    • Modern Data Platforms
      • Finding Answers in the Cloud
      • Standalone Approach
        • Benefits
        • Limitations
      • Combined Approach
        • Benefits
        • Limitations
      • Expectations of Modern Data Platforms
    • Comparison: Data Warehouse, Data Lake, Lakehouse
      • Capabilities and Limitations
        • Standalone cloud data warehouse
        • Standalone cloud data lake
        • Combined architecture
        • Lakehouse architecture
      • Implementation Activities
        • Standalone cloud data warehouse
        • Standalone cloud data lake
        • Combined architecture
        • Lakehouse architecture
      • Administration and Management
        • Standalone cloud data warehouse
        • Standalone cloud data lake
        • Combined architecture
        • Lakehouse architecture
      • Business Outcomes
        • Standalone cloud data warehouse
        • Standalone cloud data lake
        • Combined architecture
        • Lakehouse architecture
    • Lakehouse Architecture: The Default Choice for Future Data Platforms?
    • Key Takeaways
    • References
  • 3. Storage: The Heart of the Lakehouse
    • Lakehouse Storage: Key Concepts
      • Row Versus Columnar Storage
      • Storage-based Performance Optimization
    • Lakehouse Storage Components
      • Cloud Object Storage
        • Storage characteristics
      • File Formats
        • Parquet
          • File layout
          • Key features
        • ORC
          • File layout
          • Key features
        • Avro
          • File layout
          • Key features
        • Similarities, differences, and use cases
      • Table Formats
        • Hive
        • Iceberg
          • Table layout
          • Key features
        • Hudi
          • Table layout
          • Key features
        • Linux Foundations Delta Lake
          • Table layout
          • Key features
        • Similarities, differences, and use cases
    • Key Design Considerations
      • Ecosystem Support
      • Community Support
      • Supported File Formats
      • Supported Compute Engines
      • Supported Features
      • Commercial Product Support
      • Current and Future Versions
      • Performance Benchmarking
      • Comparisons
      • Sharing Features
    • Key Takeaways
    • References
  • 4. Data Catalogs
    • Understanding Metadata
      • Technical Metadata
      • Business Metadata
    • How Metastores and Data Catalogs Work Together
    • Features of a Data Catalog
      • Search, Explore, and Discover Data
      • Data Classification
      • Data Governance and Security
      • Data Lineage
    • Unified Data Catalog
      • Challenges of Siloed Metadata Management
      • What Is a Unified Data Catalog?
      • Benefits of a Unified Data Catalog
    • Implementing a Data Catalog: Key Design Considerations and Options
      • Using Hive metastore
      • Using AWS Services
      • Using Azure Services
      • Using GCP Services
      • Using Databricks
    • Key Takeaways
    • References
  • 5. Compute Engines for Lakehouse Architectures
    • Data Computation Benefits of Lakehouse Architecture
      • Independent Scaling
      • Cross-region, Cross-account Access
      • Unified Batch and Real-Time Processing
      • Enhanced BI Performance
      • Freedom to Choose Different Engine Types
      • Cross-zone Analysis
    • Compute Engine Options for Lakehouse Platforms
      • Open Source Tools
        • Tools for data engineering
          • Spark
          • Flink
        • Tools for data consumption
          • Presto and Trino
      • Cloud Services
        • AWS
          • AWS Glue
          • Amazon EMR
          • Amazon Athena
          • Other AWS services
        • Azure
          • Azure Data Factory (ADF)
          • Azure HDInsight
          • Azure Synapse Analytics
        • GCP
          • Dataproc
          • BigQuery
      • Third-Party Platforms
        • Databricks
        • Snowflake
    • Key Design Considerations
      • Open Table Format Support
      • Supported Version and Features
      • Ecosystem Support
      • Persona-Based Preferences
      • Managed Open Source Versus Cloud Native Versus Third-Party Products
      • Data Consumption Workloads
        • BI workloads
        • AI/ML workloads
    • Key Takeaways
    • References
  • 6. Data (and AI) Governance and Security in Lakehouse Architecture
    • What Is Data Governance and Data Security?
    • Benefits of Data Governance and Data Security
    • Unified Governance and Security in Lakehouse Architecture
    • Governance and Security Processes in Lakehouse Architecture
      • Metadata Management
      • Compliance and Regulations
      • Data and ML Model Quality
      • Lineage Across Data and AI assets
        • Understanding data flow
        • Performing impact analysis
        • Identifying unused objects
        • Tracking sensitive data
      • Data and AI Asset Sharing
      • Data Ownership
      • Auditing and Monitoring
      • Access Management
      • Data Protection
        • Data at rest
        • Data in transit
      • Handling Sensitive Data
        • Identify sensitive data
        • Anonymize sensitive data
    • Whats Your Role?
    • Key Takeaways
    • References
  • 7. The Big Picture: Designing and Implementing a Lakehouse Platform
    • Pre-design Activities
      • Understanding Platform Requirements
      • Studying Existing System
      • Understanding the Organizations Vision and Data Strategy
      • Conducting Workshops and Interviews
    • Choosing the Right Architecture
    • Establishing Guiding Principles
      • Data Ecosystem
      • Scalability and Performance
      • Cost Control and Optimization
      • Platform Operations
      • Governance and Security
    • Design Considerations and Implementation Best Practices
      • Architecture Blueprint
      • Data Ingestion
        • Data ingestion considerations
          • Ingestion frequency
          • Source system types
          • Identify incremental data (change data capture)
          • Sensitive data
        • Technology choices
        • Best practices
      • Data Storage
        • Storage zones considerations
          • Raw zone
          • Cleansed zone
          • Curated zone
          • Semantic zone
        • Data modeling considerations
          • Entity relationship (ER) modeling
          • Data Vault modeling
          • Dimensional modeling
        • Best practices
      • Data Processing
        • Data processing considerations
          • Open table format conversion
          • Schema and data quality validations
          • Data integration
          • Data transformations and enrichment
        • Best practices
      • Data Consumption and Delivery
        • Workload considerations
        • Best practices
      • Common Services
        • Metadata management
        • Governance and security
        • Platform operations
          • DataOps
          • MLOps
        • Best practices
    • Design References
      • Step-by-Step Design Guide
      • Design Questionnaire
    • Key Takeaways
    • References
  • 8. Lakehouse in the Real World
    • Delivering a Real-World Lakehouse
    • Estimation and Planning Phase
      • Estimation
      • Planning
    • Analysis and Design Phase
      • Analyzing the Existing System
      • Data Modeling
      • Finalizing the Tech Stack
    • Implementation and Test Phase
      • Historical Data Migration
      • Data Reconciliation and Testing
      • Reverse Engineering
      • Data Quality and Handling Sensitive Data
    • Support and Maintenance Phase
      • Auditing and Tracking
      • Disaster Recovery Strategy
      • Decommissioning the Old System
    • Delivery References
      • Project Deliverables
      • Reference Architectures
        • Cloud native implementation
        • Third-party platform implementation
    • Key Takeaways
    • References
  • 9. Lakehouse of the Future
    • Warehouse to Lakehouse: Whats Next?
      • Data Mesh
      • HTAP
      • Zero ETL
    • Interoperability and New Formats
      • Universal Format (UniForm)
      • Apache XTable
      • Upcoming File and Table Formats
    • Managed Platforms for Public and Private Clouds
      • Microsoft Fabric and Other Platforms
      • Managed Lakehouse for Private Cloud Platform
    • AI in a Lakehouse
    • Key Takeaways
    • Book Conclusion
    • References
  • Index

Dodaj do koszyka Practical Lakehouse Architecture

Code, Publish & WebDesing by CATALIST.com.pl



(c) 2005-2024 CATALIST agencja interaktywna, znaki firmowe należą do wydawnictwa Helion S.A.