reklama - zainteresowany?

The Cloud Data Lake - Helion

The Cloud Data Lake
ebook
Autor: Rukmani Gopalan
ISBN: 9781098116545
stron: 246, Format: ebook
Data wydania: 2022-12-12
Księgarnia: Helion

Cena książki: 203,15 zł (poprzednio: 236,22 zł)
Oszczędzasz: 14% (-33,07 zł)

Dodaj do koszyka The Cloud Data Lake

More organizations than ever understand the importance of data lake architectures for deriving value from their data. Building a robust, scalable, and performant data lake remains a complex proposition, however, with a buffet of tools and options that need to work together to provide a seamless end-to-end pipeline from data to insights.

This book provides a concise yet comprehensive overview on the setup, management, and governance of a cloud data lake. Author Rukmani Gopalan, a product management leader and data enthusiast, guides data architects and engineers through the major aspects of working with a cloud data lake, from design considerations and best practices to data format optimizations, performance optimization, cost management, and governance.

  • Learn the benefits of a cloud-based big data strategy for your organization
  • Get guidance and best practices for designing performant and scalable data lakes
  • Examine architecture and design choices, and data governance principles and strategies
  • Build a data strategy that scales as your organizational and business needs increase
  • Implement a scalable data lake in the cloud
  • Use cloud-based advanced analytics to gain more value from your data

Dodaj do koszyka The Cloud Data Lake

 

Osoby które kupowały "The Cloud Data Lake", wybierały także:

  • Windows Media Center. Domowe centrum rozrywki
  • Ruby on Rails. Ćwiczenia
  • DevOps w praktyce. Kurs video. Jenkins, Ansible, Terraform i Docker
  • Przywództwo w Å›wiecie VUCA. Jak być skutecznym liderem w niepewnym Å›rodowisku
  • Scrum. O zwinnym zarzÄ…dzaniu projektami. Wydanie II rozszerzone

Dodaj do koszyka The Cloud Data Lake

Spis treści

The Cloud Data Lake eBook -- spis treści

  • Preface
    • Why I Wrote This Book
    • Who Should Read This Book?
      • Introducing Klodars Corporation
    • Navigating the Book
    • Conventions Used in This Book
    • OReilly Online Learning
    • How to Contact Us
    • Acknowledgments
  • 1. Big DataBeyond the Buzz
    • What Is Big Data?
    • Elastic Data InfrastructureThe Challenge
    • Cloud Computing Fundamentals
      • Cloud Computing Terminology
      • Value Proposition of the Cloud
    • Cloud Data Lake Architecture
      • Limitations of On-Premises Data Warehouse Solutions
      • What Is a Cloud Data Lake Architecture?
      • Benefits of a Cloud Data Lake Architecture
    • Defining Your Cloud Data Lake Journey
    • Summary
  • 2. Big Data Architectures on the Cloud
    • Why Klodars Corporation Moves to the Cloud
    • Fundamentals of Cloud Data Lake Architectures
      • A Word on Variety of Data
      • Cloud Data Lake Storage
      • Big Data Analytics Engines
        • MapReduce
        • Apache Hadoop
        • Apache Spark
        • Real-time stream processing pipelines
      • Cloud Data Warehouses
    • Modern Data Warehouse Architecture
      • Reference Architecture
      • Sample Use Case for a Modern Data Warehouse Architecture
      • Benefits and Challenges of Modern Data Warehouse Architecture
    • Data Lakehouse Architecture
      • Reference Architecture for the Data Lakehouse
        • Data formats
        • Metadata
        • Compute engines
      • Sample Use Case for Data Lakehouse Architecture
      • Benefits and Challenges of the Data Lakehouse Architecture
      • Data Warehouses and Unstructured Data
    • Data Mesh
      • Reference Architecture
      • Sample Use Case for a Data Mesh Architecture
      • Challenges and Benefits of a Data Mesh Architecture
    • What Is the Right Architecture for Me?
      • Know Your Customers
      • Know Your Business Drivers
      • Consider Your Growth and Future Scenarios
      • Design Considerations
      • Hybrid Approaches
    • Summary
  • 3. Design Considerations for Your Data Lake
    • Setting Up the Cloud Data Lake Infrastructure
      • Identify Your Goals
        • How Klodars Corporation defined the data lake goals
      • Plan Your Architecture and Deliverables
        • How Klodars Corporation planned their architecture and deliverables
      • Implement the Cloud Data Lake
      • Release and Operationalize
    • Organizing Data in Your Data Lake
      • A Day in the Life of Data
      • Data Lake Zones
      • Organization Mechanisms
    • Introduction to Data Governance
      • Actors Involved in Data Governance
      • Data Classification
      • Metadata Management, Data Catalog, and Data Sharing
      • Data Access Management
      • Data Quality and Observability
      • Data Governance at Klodars Corporation
      • Data Governance Wrap-Up
    • Manage Data Lake Costs
      • Demystifying Data Lake Costs on the Cloud
      • Data Lake Cost Strategy
        • Data Lake Environments and Associated Costs
        • Cost strategy based on data
        • Transactions and impact on costs
    • Summary
  • 4. Scalable Data Lakes
    • A Sneak Peek into Scalability
      • What Is Scalability?
      • Scale in Our Day-to-Day Life
      • Scalability in Data Lake Architectures
    • Internals of Data Lake Processing Systems
      • Data Copy Internals
        • Components of a data copy solution
        • Understanding resource utilization of a data copy job
      • ELT/ETL Processing Internals
        • Components of an Apache Spark application
        • Understanding resource utilization of a Spark job
      • A Note on Other Interactive Queries
    • Considerations for Scalable Data Lake Solutions
      • Pick the Right Cloud Offerings
        • Hybrid and multicloud solutions
        • IaaS versus PaaS versus SaaS solutions
        • Cloud offerings for Klodars Corporation
      • Plan for Peak Capacity
      • Data Formats and Job Profile
    • Summary
  • 5. Optimizing Cloud Data Lake Architectures for Performance
    • Basics of Measuring Performance
      • Goals and Metrics for Performance
      • Measuring Performance
      • Optimizing for Faster Performance
    • Cloud Data Lake Performance
      • SLAs, SLOs, and SLIs
      • Example: How Klodars Corporation Managed Its SLAs, SLOs, and SLIs
    • Drivers of Performance
      • Performance Drivers for a Copy Job
      • Performance Drivers for a Spark Job
    • Optimization Principles and Techniques for Performance Tuning
      • Data Formats
        • Exploring Apache Parquet
        • Other popular data formats
        • How Klodars Corporation picked their data formats
      • Data Organization and Partitioning
        • Optimal data organization strategy for Klodars Corporation
      • Choosing the Right Configurations on Apache Spark
    • Minimize Overheads with Data Transfer
    • Premium Offerings and Performance
      • The Case of Bigger Virtual Machines
      • The Case of Flash Storage
    • Summary
  • 6. Deep Dive on Data Formats
    • Why Do We Need These Open Data Formats?
      • Why Do We Need to Store Tabular Data?
      • Why Is It a Problem to Store Tabular Data in a Cloud Data Lake Storage?
    • Delta Lake
      • Why Was Delta Lake Founded?
        • Eliminate data silos across business analysts, data scientists, and data engineers
        • Provide a unified data and computational system for batch and real-time streaming data
        • Support bulk updates or changes to existing data
        • Handle errors due to schema changes and incorrect data
      • How Does Delta Lake Work?
      • When Do You Use Delta Lake?
    • Apache Iceberg
      • Why Was Apache Iceberg Founded?
      • How Does Apache Iceberg Work?
      • When Do You Use Apache Iceberg?
    • Apache Hudi
      • Why Was Apache Hudi Founded?
      • How Does Apache Hudi Work?
        • Copy-on-write tables
        • Merge-on-read tables
      • When Do You Use Apache Hudi?
    • Summary
  • 7. Decision Framework for Your Architecture
    • Cloud Data Lake Assessment
      • Cloud Data Lake Assessment Questionnaire
    • Analysis for Your Cloud Data Lake Assessment
      • Starting from Scratch
      • Migrating an Existing Data Lake or Data Warehouse to the Cloud
      • Improving an Existing Cloud Data Lake
    • Phase 1 of Decision Framework: Assess
      • Understand Customer Requirements
      • Understand Opportunities for Improvement
      • Know Your Business Drivers
      • Complete the Assess Phase by Prioritizing the Requirements
    • Phase 2 of Decision Framework: Define
      • Finalize the Design Choices for the Cloud Data Lake
        • Picking your architecture
        • Picking your cloud provider
        • Decision points for data lake migrations
      • Plan Your Cloud Data Lake Project Deliverables
    • Phase 3 of Decision Framework: Implement
    • Phase 4 of Decision Framework: Operationalize
    • Summary
  • 8. Six Lessons for a Data Informed Future
    • Lesson 1: Focus on the How and When, Not the If and Why, When It Comes to Cloud Data Lakes
    • Lesson 2: With Great Power Comes Great ResponsibilityData Is No Exception
    • Lesson 3: Customers Lead Technology, Not the Other Way Around
    • Lesson 4: Change Is Inevitable, so Be Prepared
    • Lesson 5: Build Empathy and Prioritize Ruthlessly
    • Lesson 6: Big Impact Does Not Happen Overnight
    • Summary
  • A. Cloud Data Lake Decision Framework Template
    • Phase 1: Assess Framework
    • Phase 2: Define Framework
      • Planning the Cloud Data Lake Deliverables
    • Phase 3: Implement Framework
  • Index

Dodaj do koszyka The Cloud Data Lake

Code, Publish & WebDesing by CATALIST.com.pl



(c) 2005-2024 CATALIST agencja interaktywna, znaki firmowe należą do wydawnictwa Helion S.A.