Streaming Systems. The What, Where, When, and How of Large-Scale Data Processing - Helion
ISBN: 978-14-919-8382-9
stron: 352, Format: ebook
Data wydania: 2018-07-16
Księgarnia: Helion
Cena książki: 152,15 zł (poprzednio: 176,92 zł)
Oszczędzasz: 14% (-24,77 zł)
Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way.
Expanded from Tyler Akidau’s popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You’ll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax.
You’ll explore:
- How streaming and batch data processing patterns compare
- The core principles and concepts behind robust out-of-order data processing
- How watermarks track progress and completeness in infinite datasets
- How exactly-once data processing techniques ensure correctness
- How the concepts of streams and tables form the foundations of both batch and streaming data processing
- The practical motivations behind a powerful persistent state mechanism, driven by a real-world example
- How time-varying relations provide a link between stream processing and the world of SQL and relational algebra
Osoby które kupowały "Streaming Systems. The What, Where, When, and How of Large-Scale Data Processing", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Streaming Systems. The What, Where, When, and How of Large-Scale Data Processing eBook -- spis treści
- Preface Or: What Are You Getting Yourself Into Here?
- Navigating This Book
- Takeaways
- Conventions Used in This Book
- Online Resources
- Figures
- Code Snippets
- OReilly Safari
- How to Contact Us
- Acknowledgments
- Navigating This Book
- I. The Beam Model
- 1. Streaming 101
- Terminology: What Is Streaming?
- On the Greatly Exaggerated Limitations of Streaming
- Event Time Versus Processing Time
- Data Processing Patterns
- Bounded Data
- Unbounded Data: Batch
- Fixed windows
- Sessions
- Unbounded Data: Streaming
- Time-agnostic
- Filtering
- Inner joins
- Approximation algorithms
- Windowing
- Windowing by processing time
- Windowing by event time
- Time-agnostic
- Summary
- Terminology: What Is Streaming?
- 2. The What, Where, When, and How of Data Processing
- Roadmap
- Batch Foundations: What and Where
- What: Transformations
- Where: Windowing
- Going Streaming: When and How
- When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things!
- When: Watermarks
- When: Early/On-Time/Late Triggers FTW!
- When: Allowed Lateness (i.e., Garbage Collection)
- How: Accumulation
- Summary
- 3. Watermarks
- Definition
- Source Watermark Creation
- Perfect Watermark Creation
- Heuristic Watermark Creation
- Watermark Propagation
- Understanding Watermark Propagation
- Watermark Propagation and Output Timestamps
- The Tricky Case of Overlapping Windows
- Percentile Watermarks
- Processing-Time Watermarks
- Case Studies
- Case Study: Watermarks in Google Cloud Dataflow
- Case Study: Watermarks in Apache Flink
- Case Study: Source Watermarks for Google Cloud Pub/Sub
- Summary
- 4. Advanced Windowing
- When/Where: Processing-Time Windows
- Event-Time Windowing
- Processing-Time Windowing via Triggers
- Processing-Time Windowing via Ingress Time
- Where: Session Windows
- Where: Custom Windowing
- Variations on Fixed Windows
- Unaligned fixed windows
- Per-element/key fixed windows
- Variations on Session Windows
- Bounded sessions
- One Size Does Not Fit All
- Variations on Fixed Windows
- Summary
- When/Where: Processing-Time Windows
- 5. Exactly-Once and Side Effects
- Why Exactly Once Matters
- Accuracy Versus Completeness
- Side Effects
- Problem Definition
- Ensuring Exactly Once in Shuffle
- Addressing Determinism
- Performance
- Graph Optimization
- Bloom Filters
- Garbage Collection
- Exactly Once in Sources
- Exactly Once in Sinks
- Use Cases
- Example Source: Cloud Pub/Sub
- Example Sink: Files
- Example Sink: Google BigQuery
- Other Systems
- Apache Spark Streaming
- Apache Flink
- Summary
- II. Streams and Tables
- 6. Streams and Tables
- Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity
- Toward a General Theory of Stream and Table Relativity
- Batch Processing Versus Streams and Tables
- A Streams and Tables Analysis of MapReduce
- Map as streams/tables
- Reduce as streams/tables
- Reconciling with Batch Processing
- A Streams and Tables Analysis of MapReduce
- What, Where, When, and How in a Streams and Tables World
- What: Transformations
- Where: Windowing
- Window merging
- When: Triggers
- How: Accumulation
- A Holistic View of Streams and Tables in the Beam Model
- A General Theory of Stream and Table Relativity
- Summary
- Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity
- 7. The Practicalities of Persistent State
- Motivation
- The Inevitability of Failure
- Correctness and Efficiency
- Implicit State
- Raw Grouping
- Incremental Combining
- Generalized State
- Case Study: Conversion Attribution
- Conversion Attribution with Apache Beam
- Summary
- Motivation
- 8. Streaming SQL
- What Is Streaming SQL?
- Relational Algebra
- Time-Varying Relations
- Streams and Tables
- Looking Backward: Stream and Table Biases
- The Beam Model: A Stream-Biased Approach
- The SQL Model: A Table-Biased Approach
- Materialized views
- Looking Forward: Toward Robust Streaming SQL
- Stream and Table Selection
- Temporal Operators
- Where: windowing
- When: triggers
- A SQL-ish default: per-record triggers
- Watermark triggers
- Repeated delay triggers
- Data-driven triggers
- How: accumulation
- Retractions in a SQL world
- Discarding mode, or lack thereof
- Summary
- What Is Streaming SQL?
- 9. Streaming Joins
- All Your Joins Are Belong to Streaming
- Unwindowed Joins
- FULL OUTER
- LEFT OUTER
- RIGHT OUTER
- INNER
- ANTI
- SEMI
- Windowed Joins
- Fixed Windows
- Temporal Validity
- Temporal validity windows
- Temporal validity joins
- Watermarks and temporal validity joins
- Summary
- 10. The Evolution of Large-Scale Data Processing
- MapReduce
- Hadoop
- Flume
- Storm
- Spark
- MillWheel
- Kafka
- Cloud Dataflow
- Flink
- Beam
- Summary
- Index