reklama - zainteresowany?

Data Engineering Design Patterns - Helion

Data Engineering Design Patterns
ebook
Autor: Bartosz Konieczny
ISBN: 9781098165789
stron: 374, Format: ebook
Data wydania: 2024-05-09
Księgarnia: Helion

Cena książki: 228,65 zł (poprzednio: 265,87 zł)
Oszczędzasz: 14% (-37,22 zł)

Dodaj do koszyka Data Engineering Design Patterns

Data projects are an intrinsic part of an organization’s technical ecosystem, but data engineers in many companies continue to work on problems that others have already solved. This hands-on guide shows you how to provide valuable data by focusing on various aspects of data engineering, including data ingestion, data quality, idempotency, and more.

Author Bartosz Konieczny guides you through the process of building reliable end-to-end data engineering projects, from data ingestion to data observability, focusing on data engineering design patterns that solve common business problems in a secure and storage-optimized manner. Each pattern includes a user-facing description of the problem, solutions, and consequences that place the pattern into the context of real-life scenarios.

Throughout this journey, you’ll use open source data tools and public cloud services to apply each pattern. You'll learn:

  • Challenges data engineers face and their impact on data systems
  • How these challenges relate to data system components
  • Useful applications of data engineering patterns
  • How to identify and fix issues with your current data components
  • TTechnology-agnostic solutions to new and existing data projects, with open source implementation examples

Bartosz Konieczny is a freelance data engineer who's been coding since 2010. He's held various senior hands-on positions that allowed him to work on many data engineering problems in batch and stream processing.

Dodaj do koszyka Data Engineering Design Patterns

Spis treści

Data Engineering Design Patterns. Recipes for Solving the Most Common Data Engineering Problems eBook -- spis treści

  • Preface
    • Conventions Used in This Book
    • The Structure of This Book
    • How to Use This Book
    • What Should I Know Prior to Reading This Book?
    • Glossary and Code Examples
    • OReilly Online Learning
    • How to Contact Us
    • Acknowledgments
  • 1. Introducing Data Engineering Design Patterns
    • What Are Design Patterns?
    • Yet More Design Patterns?
    • Common Data Engineering Patterns
    • Case Study Used in This Book
    • Summary
  • 2. Data Ingestion Design Patterns
    • Full Load
      • Pattern: Full Loader
        • Problem
        • Solution
        • Consequences
          • Data volume
          • Data consistency
        • Examples
    • Incremental Load
      • Pattern: Incremental Loader
        • Problem
        • Solution
        • Consequences
          • Hard deletes
          • Backfilling
        • Examples
      • Pattern: Change Data Capture
        • Problem
        • Solution
        • Consequences
          • Complexity
          • Data scope
          • Payload
          • Data semantics
        • Examples
    • Replication
      • Pattern: Passthrough Replicator
        • Problem
        • Solution
        • Consequences
          • Keep it simple
          • Security and isolation
          • PII data
          • Latency
          • Metadata
        • Examples
      • Pattern: Transformation Replicator
        • Problem
        • Solution
        • Consequences
          • Transformation risk for text file formats
          • Desynchronization
        • Examples
    • Data Compaction
      • Pattern: Compactor
        • Problem
        • Solution
        • Consequences
          • Cost versus performance trade-offs
          • Consistency
          • Cleaning
        • Example
    • Data Readiness
      • Pattern: Readiness Marker
        • Problem
        • Solution
        • Consequences
          • Lack of enforcement
          • Reliability for late data
        • Examples
    • Event Driven
      • Pattern: External Trigger
        • Problem
        • Solution
        • Consequences
          • Push versus pull
          • Execution context
          • Error management
        • Examples
    • Summary
  • 3. Error Management Design Patterns
    • Unprocessable Records
      • Pattern: Dead-Letter
        • Problem
        • Solution
        • Consequences
          • Snowball backfilling effect
          • Dead-lettered records identification
          • Ordering and consistency
          • Error-safe functions
          • Error or failure?
        • Examples
    • Duplicated Records
      • Pattern: Windowed Deduplicator
        • Problem
        • Solution
        • Consequences
          • Space versus time trade-off
          • Idempotent producer
        • Examples
    • Late Data
      • Pattern: Late Data Detector
        • Problem
        • Solution
        • Consequences
          • Late data capture
          • MIN strategy, stuck-in-the-past situations, and stateful jobs
          • Max strategy and event skew
        • Examples
      • Pattern: Static Late Data Integrator
        • Problem
        • Solution
        • Consequences
          • Snowball backfilling effect
          • Overlapping executions and backfilling
          • Pipeline trigger
          • Waste of resources
          • Time requirement
        • Examples
      • Pattern: Dynamic Late Data Integrator
        • Problem
        • Solution
        • Consequences
          • Concurrency
          • Stateful pipelines and very late data
          • Scheduling complexity
        • Examples
    • Filtering
      • Pattern: Filter Interceptor
        • Problem
        • Solution
        • Consequences
          • Runtime impact
          • Declarative languages
          • Streaming
        • Examples
    • Fault Tolerance
      • Pattern: Checkpointer
        • Problem
        • Solution
        • Consequences
          • Delivery guarantee versus latency trade-off
          • Exactly-once feeling
        • Examples
    • Summary
  • 4. Idempotency Design Patterns
    • Overwriting
      • Pattern: Fast Metadata Cleaner
        • Problem
        • Solution
        • Consequences
          • Granularity and backfilling boundary
          • Metadata limits
          • Data exposition layer
          • Schema evolution
        • Examples
      • Pattern: Data Overwrite
        • Problem
        • Solution
        • Consequences
          • Data overhead
          • Vacuum need
        • Examples
    • Updates
      • Pattern: Merger
        • Problem
        • Solution
        • Consequences
          • Uniqueness
          • I/O
          • Incremental datasets with backfilling
        • Examples
      • Pattern: Stateful Merger
        • Problem
        • Solution
        • Consequences
          • Versioned data stores
          • Vacuum operations
          • Metadata operations
        • Examples
    • Database
      • Pattern: Keyed Idempotency
        • Problem
        • Solution
        • Consequences
          • Database dependent
          • Mutable data source
        • Examples
      • Pattern: Transactional Writer
        • Problem
        • Solution
        • Consequences
          • Commit step
          • Distributed processing
          • Idempotency scope
        • Examples
    • Immutable Dataset
      • Pattern: Proxy
        • Problem
        • Solution
        • Consequences
          • Database support
          • Immutability configuration
        • Examples
    • Summary
  • 5. Data Value Design Patterns
    • Data Enrichment
      • Pattern: Static Joiner
        • Problem
        • Solution
        • Consequences
          • Late data and consistency
          • Idempotency
        • Examples
      • Pattern: Dynamic Joiner
        • Problem
        • Solution
        • Consequences
          • Space versus exactness trade-off
          • Late data
        • Examples
    • Data Decoration
      • Pattern: Wrapper
        • Problem
        • Solution
        • Consequences
          • Domain split
          • Size
        • Examples
      • Pattern: Metadata Decorator
        • Problem
        • Solution
        • Consequences
          • Implementation
          • Data
        • Examples
    • Data Aggregation
      • Pattern: Distributed Aggregator
        • Problem
        • Solution
        • Consequences
          • Additional network exchange
          • Data skew
          • Scaling
        • Examples
      • Pattern: Local Aggregator
        • Problem
        • Solution
        • Consequences
          • Scaling
          • Grouping keys
        • Examples
    • Sessionization
      • Pattern: Incremental Sessionizer
        • Problem
        • Solution
        • Consequences
          • Inactivity period
          • Data freshness
          • Late data, event time partitions, and backfilling
        • Examples
      • Pattern: Stateful Sessionizer
        • Problem
        • Solution
        • Consequences
          • At-least-once processing
          • Scaling
          • Inactivity period length
          • Inactivity period time
        • Examples
    • Data Ordering
      • Pattern: Bin Pack Orderer
        • Problem
        • Solution
        • Consequences
          • Retries
          • Complexity
        • Examples
      • Pattern: FIFO Orderer
        • Problem
        • Solution
        • Consequences
          • I/O overhead and latency
          • FIFO is not exactly once
        • Examples
    • Summary
  • 6. Data Flow Design Patterns
    • Sequence
      • Pattern: Local Sequencer
        • Problem
        • Solution
        • Consequences
          • Boundaries
        • Examples
      • Pattern: Isolated Sequencer
        • Problem
        • Solution
        • Consequences
          • Scheduling
          • Communication
        • Examples
    • Fan-In
      • Pattern: Aligned Fan-In
        • Problem
        • Solution
        • Consequences
          • Infrastructure spikes
          • Scheduling skew
          • Scheduling overhead
          • Complexity
        • Examples
      • Pattern: Unaligned Fan-In
        • Problem
        • Solution
        • Consequences
          • Readability
          • Partial data
        • Examples
    • Fan-Out
      • Pattern: Parallel Split
        • Problem
        • Solution
        • Consequences
          • Blocked execution
          • Hardware
        • Examples
      • Pattern: Exclusive Choice
        • Problem
        • Solution
        • Consequences
          • Complexity factory
          • Hidden logic
          • Heavy conditions
        • Examples
    • Orchestration
      • Pattern: Single Runner
        • Problem
        • Solution
        • Consequences
          • Backfilling
          • Latency
        • Examples
      • Pattern: Concurrent Runner
        • Problem
        • Solution
        • Consequences
          • Resource starvation
          • Shared state
        • Examples
    • Summary
  • 7. Data Security Design Patterns
    • Data Removal
      • Pattern: Vertical Partitioner
        • Problem
        • Solution
        • Consequences
          • Query performance
          • Querying complexity
          • Complexity in a polyglot world
          • Raw data
        • Examples
      • Pattern: In-Place Overwriter
        • Problem
        • Solution
        • Consequences
          • I/O overhead
          • Cost
        • Examples
    • Access Control
      • Pattern: Fine-Grained Accessor for Tables
        • Problem
        • Solution
        • Consequences
          • Row-level security limits
          • Data type
          • Query overhead
        • Examples
      • Pattern: Fine-Grained Accessor for Resources
        • Problem
        • Solution
        • Consequences
          • Security by the book trade-off
          • Complexity
          • Quotas
        • Examples
    • Data Protection
      • Pattern: Encryptor
        • Problem
        • Solution
        • Consequences
          • Encryption/decryption overhead
          • Data loss risk
          • Protocol updates
        • Examples
      • Pattern: Anonymizer
        • Problem
        • Solution
        • Consequences
          • Information loss
        • Examples
      • Pattern: Pseudo-Anonymizer
        • Problem
        • Solution
        • Consequences
          • False sense of security
          • Information loss
        • Examples
    • Connectivity
      • Pattern: Secrets Pointer
        • Problem
        • Solution
        • Consequences
          • Cache invalidation and streaming jobs
          • Logs
          • A secret remains secret
        • Examples
      • Pattern: Secretless Connector
        • Problem
        • Solution
        • Consequences
          • Workless impression
          • Rotation
        • Examples
    • Summary
  • 8. Data Storage Design Patterns
    • Partitioning
      • Pattern: Horizontal Partitioner
        • Problem
        • Solution
        • Consequences
          • Granularity and metadata overhead
          • Skew
          • Mutability
        • Examples
      • Pattern: Vertical Partitioner
        • Problem
        • Solution
        • Consequences
          • Domain split
          • Querying
          • Data producer
        • Examples
    • Records Organization
      • Pattern: Bucket
        • Problem
        • Solution
        • Consequences
          • Mutability
          • Bucket size
        • Examples
      • Pattern: Sorter
        • Problem
        • Solution
        • Consequences
          • Unsorted segments
          • Composite sort keys
          • Mutability
        • Examples
    • Read Performance Optimization
      • Pattern: Metadata Enhancer
        • Problem
        • Solution
        • Consequences
          • Overhead
          • Out-of-date statistics
        • Examples
      • Pattern: Dataset Materializer
        • Problem
        • Solution
        • Consequences
          • Refresh cost
          • Data access
          • Data storage overhead
        • Examples
      • Pattern: Manifest
        • Problem
        • Solution
        • Consequences
          • Complexity
          • Size
        • Examples
    • Data Representation
      • Pattern: Normalizer
        • Problem
        • Solution
        • Consequences
          • Query cost
          • Archival
        • Examples
      • Pattern: Denormalizer
        • Problem
        • Solution
        • Consequences
          • Costly updates
          • Storage
          • One big antipattern
        • Examples
    • Summary
  • 9. Data Quality Design Patterns
    • Quality Enforcement
      • Pattern: Audit-Write-Audit-Publish
        • Problem
        • Solution
        • Consequences
          • Compute cost
          • Rules coverage
          • Streaming latency
          • An issue may not be an issue
        • Examples
      • Pattern: Constraints Enforcer
        • Problem
        • Solution
        • Consequences
          • All-or-nothing semantics
          • Data producer shift
          • Constraints coverage
        • Examples
    • Schema Consistency
      • Pattern: Schema Compatibility Enforcer
        • Problem
        • Solution
        • Consequences
          • Interaction overhead
          • Schema evolution
        • Examples
      • Pattern: Schema Migrator
        • Problem
        • Solution
        • Consequences
          • Size impact
          • Impossible removal
        • Examples
    • Quality Observation
      • Pattern: Offline Observer
        • Problem
        • Solution
        • Consequences
          • Time accuracy
          • Compute resources
        • Examples
      • Pattern: Online Observer
        • Problem
        • Solution
        • Consequences
          • Extra delays
          • Parallel splits
        • Examples
    • Summary
  • 10. Data Observability Design Patterns
    • Data Detectors
      • Pattern: Flow Interruption Detector
        • Problem
        • Solution
        • Consequences
          • Threshold
          • Metadata
          • False positives for storage
        • Examples
      • Pattern: Skew Detector
        • Problem
        • Solution
        • Consequences
          • Seasonality
          • Communication
          • Fatality loop
        • Examples
    • Time Detectors
      • Pattern: Lag Detector
        • Problem
        • Solution
        • Consequences
          • Data skew
        • Examples
      • Pattern: SLA Misses Detector
        • Problem
        • Solution
        • Consequences
          • Late data and event time
        • Examples
    • Data Lineage
      • Pattern: Dataset Tracker
        • Problem
        • Solution
        • Consequences
          • Vendor lock
          • Custom work
        • Examples
      • Pattern: Fine-Grained Tracker
        • Problem
        • Solution
        • Consequences
          • Custom code
          • Row-level visualization
          • Evolution management
        • Examples
    • Summary
  • Afterword
  • A. Summary of Patterns
    • Data Ingestion Design Patterns
    • Error Management Design Patterns
    • Idempotency Design Patterns
    • Data Value Design Patterns
    • Data Flow Design Patterns
    • Data Security Design Patterns
    • Data Storage Design Patterns
    • Data Quality Design Patterns
    • Data Observability Design Patterns
  • Index

Dodaj do koszyka Data Engineering Design Patterns

Code, Publish & WebDesing by CATALIST.com.pl



(c) 2005-2025 CATALIST agencja interaktywna, znaki firmowe należą do wydawnictwa Helion S.A.