Stream Processing with Apache Flink. Fundamentals, Implementation, and Operation of Streaming Applications - Helion
ISBN: 978-14-919-7424-7
stron: 310, Format: ebook
Data wydania: 2019-04-11
Księgarnia: Helion
Cena książki: 220,15 zł (poprzednio: 255,99 zł)
Oszczędzasz: 14% (-35,84 zł)
Get started with Apache Flink, the open source framework that powers some of the world’s largest stream processing applications. With this practical book, you’ll explore the fundamental concepts of parallel stream processing and discover how this technology differs from traditional batch data processing.
Longtime Apache Flink committers Fabian Hueske and Vasia Kalavri show you how to implement scalable streaming applications with Flink’s DataStream API and continuously run and maintain these applications in operational environments. Stream processing is ideal for many use cases, including low-latency ETL, streaming analytics, and real-time dashboards as well as fraud detection, anomaly detection, and alerting. You can process continuous data of any kind, including user interactions, financial transactions, and IoT data, as soon as you generate them.
- Learn concepts and challenges of distributed stateful stream processing
- Explore Flink’s system architecture, including its event-time processing mode and fault-tolerance model
- Understand the fundamentals and building blocks of the DataStream API, including its time-based and statefuloperators
- Read data from and write data to external systems with exactly-once consistency
- Deploy and configure Flink clusters
- Operate continuously running streaming applications
Osoby które kupowały "Stream Processing with Apache Flink. Fundamentals, Implementation, and Operation of Streaming Applications", wybierały także:
- Zosta 149,00 zł, (44,70 zł -70%)
- Metoda dziel i zwyci 89,00 zł, (26,70 zł -70%)
- Matematyka. Kurs video. Teoria dla programisty i data science 399,00 zł, (119,70 zł -70%)
- Design Thinking. Kurs video. My 129,00 zł, (38,70 zł -70%)
- Konwolucyjne sieci neuronowe. Kurs video. Tensorflow i Keras w rozpoznawaniu obraz 149,00 zł, (44,70 zł -70%)
Spis treści
Stream Processing with Apache Flink. Fundamentals, Implementation, and Operation of Streaming Applications eBook -- spis treści
- Preface
- What You Will Learn in This Book
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- 1. Introduction to Stateful Stream Processing
- Traditional Data Infrastructures
- Transactional Processing
- Analytical Processing
- Stateful Stream Processing
- Event-Driven Applications
- Data Pipelines
- Streaming Analytics
- The Evolution of Open Source Stream Processing
- A Bit of History
- A Quick Look at Flink
- Running Your First Flink Application
- Summary
- Traditional Data Infrastructures
- 2. Stream Processing Fundamentals
- Introduction to Dataflow Programming
- Dataflow Graphs
- Data Parallelism and Task Parallelism
- Data Exchange Strategies
- Processing Streams in Parallel
- Latency and Throughput
- Latency
- Throughput
- Latency Versus Throughput
- Operations on Data Streams
- Data ingestion and data egress
- Transformation operations
- Rolling aggregations
- Window operations
- Latency and Throughput
- Time Semantics
- What Does One Minute Mean in Stream Processing?
- Processing Time
- Event Time
- Watermarks
- Processing Time Versus Event Time
- State and Consistency Models
- Task Failures
- What is a task failure?
- Result Guarantees
- At-most-once
- At-least-once
- Exactly-once
- End-to-end exactly-once
- Task Failures
- Summary
- Introduction to Dataflow Programming
- 3. The Architecture of Apache Flink
- System Architecture
- Components of a Flink Setup
- Application Deployment
- Task Execution
- Highly Available Setup
- TaskManager failures
- JobManager failures
- Data Transfer in Flink
- Credit-Based Flow Control
- Task Chaining
- Event-Time Processing
- Timestamps
- Watermarks
- Watermark Propagation and Event Time
- Timestamp Assignment and Watermark Generation
- State Management
- Operator State
- Keyed State
- State Backends
- Scaling Stateful Operators
- Checkpoints, Savepoints, and State Recovery
- Consistent Checkpoints
- Recovery from a Consistent Checkpoint
- Flinks Checkpointing Algorithm
- Performace Implications of Checkpointing
- Savepoints
- Using savepoints
- Starting an application from a savepoint
- Summary
- System Architecture
- 4. Setting Up a Development Environment for Apache Flink
- Required Software
- Run and Debug Flink Applications in an IDE
- Import the Books Examples in an IDE
- Run Flink Applications in an IDE
- Debug Flink Applications in an IDE
- Bootstrap a Flink Maven Project
- Summary
- 5. The DataStream API (v1.7)
- Hello, Flink!
- Set Up the Execution Environment
- Read an Input Stream
- Apply Transformations
- Output the Result
- Execute
- Transformations
- Basic Transformations
- Map
- Filter
- FlatMap
- KeyedStream Transformations
- keyBy
- Rolling aggregations
- Reduce
- Multistream Transformations
- Union
- Connect, coMap, and coFlatMap
- Split and select
- Distribution Transformations
- Basic Transformations
- Setting the Parallelism
- Types
- Supported Data Types
- Creating Type Information for Data Types
- Explicitly Providing Type Information
- Defining Keys and Referencing Fields
- Field Positions
- Field Expressions
- Key Selectors
- Implementing Functions
- Function Classes
- Lambda Functions
- Rich Functions
- Including External and Flink Dependencies
- Summary
- Hello, Flink!
- 6. Time-Based and Window Operators
- Configuring Time Characteristics
- Assigning Timestamps and Generating Watermarks
- Assigner with periodic watermarks
- Assigner with punctuated watermarks
- Watermarks, Latency, and Completeness
- Assigning Timestamps and Generating Watermarks
- Process Functions
- TimerService and Timers
- Emitting to Side Outputs
- CoProcessFunction
- Window Operators
- Defining Window Operators
- Built-in Window Assigners
- Tumbling windows
- Sliding windows
- Session windows
- Applying Functions on Windows
- ReduceFunction
- AggregateFunction
- ProcessWindowFunction
- Incremental aggregation and ProcessWindowFunction
- Customizing Window Operators
- Window lifecycle
- Window assigners
- Triggers
- Evictors
- Joining Streams on Time
- Interval Join
- Window Join
- Handling Late Data
- Dropping Late Events
- Redirecting Late Events
- Updating Results by Including Late Events
- Summary
- Configuring Time Characteristics
- 7. Stateful Operators and Applications
- Implementing Stateful Functions
- Declaring Keyed State at RuntimeContext
- Implementing Operator List State with the ListCheckpointed Interface
- Using Connected Broadcast State
- Using the CheckpointedFunction Interface
- Receiving Notifications About Completed Checkpoints
- Enabling Failure Recovery for Stateful Applications
- Ensuring the Maintainability of Stateful Applications
- Specifying Unique Operator Identifiers
- Defining the Maximum Parallelism of Keyed State Operators
- Performance and Robustness of Stateful Applications
- Choosing a State Backend
- Choosing a State Primitive
- Preventing Leaking State
- Evolving Stateful Applications
- Updating an Application without Modifying Existing State
- Removing State from an Application
- Modifying the State of an Operator
- Queryable State
- Architecture and Enabling Queryable State
- Exposing Queryable State
- Querying State from External Applications
- Summary
- Implementing Stateful Functions
- 8. Reading from and Writing to External Systems
- Application Consistency Guarantees
- Idempotent Writes
- Transactional Writes
- Provided Connectors
- Apache Kafka Source Connector
- Apache Kafka Sink Connector
- Filesystem Source Connector
- Filesystem Sink Connector
- Apache Cassandra Sink Connector
- Implementing a Custom Source Function
- Resettable Source Functions
- Source Functions, Timestamps, and Watermarks
- Implementing a Custom Sink Function
- Idempotent Sink Connectors
- Transactional Sink Connectors
- GenericWriteAheadSink
- TwoPhaseCommitSinkFunction
- Asynchronously Accessing External Systems
- Summary
- Application Consistency Guarantees
- 9. Setting Up Flink for Streaming Applications
- Deployment Modes
- Standalone Cluster
- Docker
- Apache Hadoop YARN
- Kubernetes
- Highly Available Setups
- HA Standalone Setup
- HA YARN Setup
- HA Kubernetes Setup
- Integration with Hadoop Components
- Filesystem Configuration
- System Configuration
- Java and Classloading
- CPU
- Main Memory and Network Buffers
- Disk Storage
- Checkpointing and State Backends
- Security
- Summary
- Deployment Modes
- 10. Operating Flink and Streaming Applications
- Running and Managing Streaming Applications
- Savepoints
- Managing Applications with the Command-Line Client
- Starting an application
- Listing running applications
- Taking and disposing of a savepoint
- Canceling an application
- Starting an application from a savepoint
- Scaling an Application In and Out
- Managing Applications with the REST API
- Managing and monitoring a Flink cluster
- Managing and montioring Flink applications
- Bundling and Deploying Applications in Containers
- Building a job-specific Flink Docker image
- Running a job-specific Docker image on Kubernetes
- Controlling Task Scheduling
- Controlling Task Chaining
- Defining Slot-Sharing Groups
- Tuning Checkpointing and Recovery
- Configuring Checkpointing
- Enabling checkpoint compression
- Retaining checkpoints after an application has stopped
- Configuring State Backends
- Configuring Recovery
- Restart strategies
- Local recovery
- Configuring Checkpointing
- Monitoring Flink Clusters and Applications
- Flink Web UI
- Metric System
- Registering and using metrics
- Metric groups
- Scoping and formatting metrics
- Exposing metrics
- Monitoring Latency
- Configuring the Logging Behavior
- Summary
- Running and Managing Streaming Applications
- 11. Where to Go from Here?
- The Rest of the Flink Ecosystem
- The DataSet API for Batch Processing
- Table API and SQL for Relational Analysis
- FlinkCEP for Complex Event Processing and Pattern Matching
- Gelly for Graph Processing
- A Welcoming Community
- Mailing lists
- Blogs
- Meetups and conferences
- The Rest of the Flink Ecosystem
- Index