Streaming Data Mesh - Helion

ebook

Autor: Hubert Dulay, Stephen Mooney
ISBN: 9781098130688
stron: 226, Format: ebook
Data wydania: 2023-05-11
Księgarnia: Helion

Cena książki: 211,65 zł (poprzednio: 246,10 zł)
Oszczędzasz: 14% (-34,45 zł)

Osoby, które kupiły tę książkę, wybierały także »

Data lakes and warehouses have become increasingly fragile, costly, and difficult to maintain as data gets bigger and moves faster. Data meshes can help your organization decentralize data, giving ownership back to the engineers who produced it. This book provides a concise yet comprehensive overview of data mesh patterns for streaming and real-time data services.

Authors Hubert Dulay and Stephen Mooney examine the vast differences between streaming and batch data meshes. Data engineers, architects, data product owners, and those in DevOps and MLOps roles will learn steps for implementing a streaming data mesh, from defining a data domain to building a good data product. Through the course of the book, you'll create a complete self-service data platform and devise a data governance system that enables your mesh to work seamlessly.

With this book, you will:

Design a streaming data mesh using Kafka
Learn how to identify a domain
Build your first data product using self-service tools
Apply data governance to the data products you create
Learn the differences between synchronous and asynchronous data services
Implement self-services that support decentralized data

Osoby które kupowały "Streaming Data Mesh", wybierały także:

Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
Efekt piaskownicy. Jak szefować żeby roboty nie zabrały ci roboty 59,50 zł, (11,90 zł -80%)
Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)

Spis treści

Streaming Data Mesh eBook -- spis treści

Preface
- Who Should Read This Book
- Why We Wrote This Book
- Navigating This Book
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
  - Hubert
  - Stephen
1. Data Mesh Introduction
- Data Divide
- Data Mesh Pillars
  - Data Ownership
  - Data as a Product
  - Federated Computational Data Governance
  - Self-Service Data Platform
  - Data Mesh Diagram
- Other Similar Architectural Patterns
  - Data Fabric
  - Data Gateways and Data Services
  - Data Democratization
  - Data Virtualization
- Focusing on Implementation
  - Apache Kafka
  - AsyncAPI
2. Streaming Data Mesh Introduction
- The Streaming Advantage
  - Streaming Enables Real-Time Use Cases
  - Streaming Enables Data Optimization Advantages
  - Reverse ETL
- The Kappa Architecture
  - Lambda Architecture Introduction
  - Kappa Architecture Introduction
- Summary
3. Domain Ownership
- Identifying Domains
  - Discernible Domains
  - Geographic Regions
    - Subdomains and subdata mesh
    - Data sovereignty
  - Hybrid Architecture
  - Multicloud
    - Disaster recovery
    - Analytics
- Avoiding Ambiguous Domains
- Domain-Driven Design
  - Domain Model
  - Domain Logic
  - Bounded Context
  - The Ubiquitous Language
- Data Mesh Domain Roles
  - Data Product Engineer
  - Data Product Owner or Data Steward
- Streaming Data Mesh Tools and Platforms to Consider
- Domain Charge-Backs
- Summary
4. Streaming Data Products
- Defining Data Product Requirements
- Identifying Data Product Derivatives
  - Derivatives from Other Domains
- Ingesting Data Product Derivatives with Kafka Connect
  - Consumability
    - Scalability
    - Interoperability and data serialization
  - Synchronous Data Sources
  - Asynchronous Data Sources and Change Data Capture
  - Debezium Connectors
- Transforming Data Derivatives to Data Products
  - Data Standardization
  - Protecting Sensitive Information
  - SQL
    - SaaS stream processor
    - ksqlDB
      - Provisioning connectors in ksqlDB
      - User-defined functions in ksqlDB
  - Extract, Transform, and Load
    - Maintaining data warehouse concepts
    - Data warehousing basics
    - Dimensional versus fact data in a streaming context
    - Materialized views in streams
    - Streaming ETL with domain-driven design
- Publishing Data Products with AsyncAPI
  - Registering the Streaming Data Product
  - Building an AsyncAPI YAML Document
    - Objects asyncapi, externalDocs, info, and tags
    - Servers and security section
    - Channels and topic section
    - Components section
      - Messages section
      - Security schemes section
      - Traits section
  - Assigning Data Tags
    - Quality
    - Security
    - Throughput
  - Versioning
  - Monitoring
- Summary
5. Federated Computational Data Governance
- Data Governance in a Streaming Data Mesh
  - Data Lineage Graph
  - Streaming Data Catalog to Organize Data Products
- Metadata
  - Schemas
  - Lineage
  - Security
  - Scalability
- Generating the Data Product Page from AsyncAPI
  - Apicurio Registry
  - Access Workflow
- Centralized Versus Decentralized
  - Centralized Engineers
  - Decentralized (Domain) Engineers
- Summary
6. Self-Service Data Infrastructure
- Streaming Data Mesh CLI
- Resource-Related Commands
  - Cluster-Related Commands
  - Topic-Related Commands
  - The domain Commands
  - The connect Commands
  - The streaming Commands
    - The udf command
    - The sql command
  - Publishing a Streaming Data Product
- Data Governance-Related Services
  - Security Services
    - Data obfuscation services
      - Encryption
      - Encryption and decryption UDFs
      - Tokenization and detokenization UDFs
      - Sensitive information detection
    - Identity services
    - Auditing
  - Standards Services
  - Lineage Services
- SaaS Services and APIs
- Summary
7. Architecting a Streaming Data Mesh
- Infrastructure
- Two Architecture Solutions
  - Dedicated Infrastructure
    - Producing domain architecture
    - High-throughput producing domain
    - Consuming domain architecture
      - Real-time online analytical processing databases
      - Consuming domains without a streaming platform
    - Recommended architectures
  - Multitenant Infrastructure
    - Producing domain architecture
    - Consuming domain architecture
    - Regions
- Streaming Data Mesh Central Architecture
  - The Domain Agent (aka Sidecar)
  - Data Plane
  - Control Plane
    - The management plane and metadata and registry plane
    - Self-service plane
      - Workflow orchestration
      - Implementing a DAG for linking
      - Implementing a DAG for publishing data products
      - Infrastructure as code (IaC)
- Summary
8. Building a Decentralized Data Team
- The Traditional Data Warehouse Structure
- Introducing the Decentralized Team Structure
  - Empowering People
  - Working Processes
  - Fostering Collaboration
  - Data-Driven Automation
- New Roles in Data Domains
  - New Roles in the Data Plane
  - New Roles in Data Science and Business Intelligence
9. Feature Stores
- Separating Data Engineering from Data Science
- Online and Offline Data Stores
- Apache Feast Introduction
- Summary
10. Streaming Data Mesh in Practice
- Streaming Data Mesh Example
- Deploying an On-Premises Streaming Data Mesh
  - Installing a Connector
  - Deploying Clickstream Connector and Auto-Creating Tables
    - Deploy a Datagen connector
    - Create the first few nodes
    - Create a table-like structure in ksqlDB
  - Deploying the Debezium Postgres CDC Connector
  - Enrichment of Streaming Data
    - Stream versus table
  - Publishing the Data Product
- Consuming Streaming Data Products
- Fully Managed SaaS Services
- Summary and Considerations
Index