Mastering Kafka Streams and ksqlDB - Helion
ISBN: 978-14-920-6244-8
stron: 434, Format: ebook
Data wydania: 2021-02-04
Księgarnia: Helion
Cena książki: 220,15 zł (poprzednio: 255,99 zł)
Oszczędzasz: 14% (-35,84 zł)
Working with unbounded and fast-moving data streams has historically been difficult. But with Kafka Streams and ksqlDB, building stream processing applications is easy and fun. This practical guide shows data engineers how to use these tools to build highly scalable stream processing applications for moving, enriching, and transforming large amounts of data in real time.
Mitch Seymour, data services engineer at Mailchimp, explains important stream processing concepts against a backdrop of several interesting business problems. You'll learn the strengths of both Kafka Streams and ksqlDB to help you choose the best tool for each unique stream processing project. Non-Java developers will find the ksqlDB path to be an especially gentle introduction to stream processing.
- Learn the basics of Kafka and the pub/sub communication pattern
- Build stateless and stateful stream processing applications using Kafka Streams and ksqlDB
- Perform advanced stateful operations, including windowed joins and aggregations
- Understand how stateful processing works under the hood
- Learn about ksqlDB's data integration features, powered by Kafka Connect
- Work with different types of collections in ksqlDB and perform push and pull queries
- Deploy your Kafka Streams and ksqlDB applications to production
Osoby które kupowały "Mastering Kafka Streams and ksqlDB", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Mapa Agile & Scrum. Jak si 57,69 zł, (15,00 zł -74%)
- Sztuka podst 53,46 zł, (13,90 zł -74%)
- Lean dla bystrzaków. Wydanie II 49,62 zł, (12,90 zł -74%)
Spis treści
Mastering Kafka Streams and ksqlDB eBook -- spis treści
- Foreword
- Preface
- Who Should Read This Book
- Navigating This Book
- Source Code
- Kafka Streams Version
- ksqlDB Version
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. Kafka
- 1. A Rapid Introduction to Kafka
- Communication Model
- How Are Streams Stored?
- Topics and Partitions
- Events
- Kafka Cluster and Brokers
- Consumer Groups
- Installing Kafka
- Hello, Kafka
- Summary
- II. Kafka Streams
- 2. Getting Started with Kafka Streams
- The Kafka Ecosystem
- Before Kafka Streams
- Enter Kafka Streams
- Features at a Glance
- Operational Characteristics
- Scalability
- Reliability
- Maintainability
- Comparison to Other Systems
- Deployment Model
- Processing Model
- Kappa Architecture
- Use Cases
- Processor Topologies
- Sub-Topologies
- Depth-First Processing
- Benefits of Dataflow Programming
- Tasks and Stream Threads
- High-Level DSL Versus Low-Level Processor API
- Introducing Our Tutorial: Hello, Streams
- Project Setup
- Creating a New Project
- Adding the Kafka Streams Dependency
- DSL
- Processor API
- Streams and Tables
- Stream/Table Duality
- KStream, KTable, GlobalKTable
- Summary
- The Kafka Ecosystem
- 3. Stateless Processing
- Stateless Versus Stateful Processing
- Introducing Our Tutorial: Processing a Twitter Stream
- Project Setup
- Adding a KStream Source Processor
- Serialization/Deserialization
- Building a Custom Serdes
- Defining Data Classes
- Implementing a Custom Deserializer
- Implementing a Custom Serializer
- Building the Tweet Serdes
- Filtering Data
- Branching Data
- Translating Tweets
- Merging Streams
- Enriching Tweets
- Avro Data Class
- Sentiment Analysis
- Serializing Avro Data
- Registryless Avro Serdes
- Schema RegistryAware Avro Serdes
- Adding a Sink Processor
- Running the Code
- Empirical Verification
- Summary
- 4. Stateful Processing
- Benefits of Stateful Processing
- Preview of Stateful Operators
- State Stores
- Common Characteristics
- Embedded
- Multiple access modes
- Fault tolerant
- Key-based
- Persistent Versus In-Memory Stores
- Common Characteristics
- Introducing Our Tutorial: Video Game Leaderboard
- Project Setup
- Data Models
- Adding the Source Processors
- KStream
- KTable
- GlobalKTable
- Registering Streams and Tables
- Joins
- Join Operators
- Join Types
- Co-Partitioning
- Value Joiners
- KStream to KTable Join (players Join)
- KStream to GlobalKTable Join (products Join)
- Grouping Records
- Grouping Streams
- Grouping Tables
- Aggregations
- Aggregating Streams
- Initializer
- Adder
- Aggregating Tables
- Subtractor
- Aggregating Streams
- Putting It All Together
- Interactive Queries
- Materialized Stores
- Accessing Read-Only State Stores
- Querying Nonwindowed Key-Value Stores
- Point lookups
- Range scans
- All entries
- Number of entries
- Local Queries
- Remote Queries
- Summary
- 5. Windows and Time
- Introducing Our Tutorial: Patient Monitoring Application
- Project Setup
- Data Models
- Time Semantics
- Timestamp Extractors
- Included Timestamp Extractors
- Custom Timestamp Extractors
- Registering Streams with a Timestamp Extractor
- Windowing Streams
- Window Types
- Tumbling windows
- Hopping windows
- Session windows
- Sliding join windows
- Sliding aggregation windows
- Selecting a Window
- Windowed Aggregation
- Window Types
- Emitting Window Results
- Grace Period
- Suppression
- Filtering and Rekeying Windowed KTables
- Windowed Joins
- Time-Driven Dataflow
- Alerts Sink
- Querying Windowed Key-Value Stores
- Key + window range scans
- Window range scans
- All entries
- Summary
- 6. Advanced State Management
- Persistent Store Disk Layout
- Fault Tolerance
- Changelog Topics
- Standby Replicas
- Rebalancing: Enemy of the State (Store)
- Preventing State Migration
- Sticky Assignment
- Static Membership
- Reducing the Impact of Rebalances
- Incremental Cooperative Rebalancing
- Controlling State Size
- Tombstones
- Window retention
- Aggressive topic compaction
- Fixed-size LRU cache
- Deduplicating Writes with Record Caches
- State Store Monitoring
- Adding State Listeners
- Adding State Restore Listeners
- Built-in Metrics
- Interactive Queries
- Custom State Stores
- Summary
- 7. Processor API
- When to Use the Processor API
- Introducing Our Tutorial: IoT Digital Twin Service
- Project Setup
- Data Models
- Adding Source Processors
- Adding Stateless Stream Processors
- Creating Stateless Processors
- Creating Stateful Processors
- Periodic Functions with Punctuate
- Accessing Record Metadata
- Adding Sink Processors
- Interactive Queries
- Putting It All Together
- Combining the Processor API with the DSL
- Processors and Transformers
- Putting It All Together: Refactor
- Summary
- III. ksqlDB
- 8. Getting Started with ksqlDB
- What Is ksqlDB?
- When to Use ksqlDB
- Evolution of a New Kind of Database
- Kafka Streams Integration
- Connect Integration
- How Does ksqlDB Compare to a Traditional SQL Database?
- Similarities
- Differences
- Architecture
- ksqlDB Server
- SQL engine
- REST service
- ksqlDB Clients
- ksqlDB CLI
- ksqlDB UI
- ksqlDB Server
- Deployment Modes
- Interactive Mode
- Headless Mode
- Tutorial
- Installing ksqlDB
- Running a ksqlDB Server
- Precreating Topics
- Using the ksqlDB CLI
- Summary
- 9. Data Integration with ksqlDB
- Kafka Connect Overview
- External Versus Embedded Connect
- External Mode
- Embedded Mode
- Configuring Connect Workers
- Converters and Serialization Formats
- Tutorial
- Installing Connectors
- Creating Connectors with ksqlDB
- Showing Connectors
- Describing Connectors
- Dropping Connectors
- Verifying the Source Connector
- Interacting with the Kafka Connect Cluster Directly
- Introspecting Managed Schemas
- Summary
- 10. Stream Processing Basics with ksqlDB
- Tutorial: Monitoring Changes at Netflix
- Project Setup
- Source Topics
- Data Types
- Custom Types
- Collections
- Creating Source Collections
- With Clause
- Working with Streams and Tables
- Showing Streams and Tables
- Describing Streams and Tables
- Altering Streams and Tables
- Dropping Streams and Tables
- Basic Queries
- Insert Values
- Simple Selects (Transient Push Queries)
- Projection
- Filtering
- Wildcards
- Logical operators
- Between (range filter)
- Flattening/Unnesting Complex Structures
- Conditional Expressions
- Coalesce
- IFNULL
- Case Statements
- Writing Results Back to Kafka (Persistent Queries)
- Creating Derived Collections
- Showing queries
- Explaining queries
- Terminating queries
- Creating Derived Collections
- Putting It All Together
- Summary
- 11. Intermediate and Advanced Stream Processing with ksqlDB
- Project Setup
- Bootstrapping an Environment from a SQL File
- Data Enrichment
- Joins
- Casting a column to a new type
- Repartitioning data
- Persistent joins
- Windowed Joins
- Joins
- Aggregations
- Aggregation Basics
- Windowed Aggregations
- Delayed data
- Window retention
- Materialized Views
- Clients
- Pull Queries
- Curl
- Push Queries
- Push Queries via Curl
- Functions and Operators
- Operators
- Showing Functions
- Describing Functions
- Creating Custom Functions
- Stop-word removal UDF
- Additional Resources for Custom ksqlDB Functions
- Summary
- IV. The Road to Production
- 12. Testing, Monitoring, and Deployment
- Testing
- Testing ksqlDB Queries
- Testing Kafka Streams
- Unit tests
- DSL
- Processor API
- Unit tests
- Behavioral Tests
- Benchmarking
- Kafka Cluster Benchmarking
- Final Thoughts on Testing
- Monitoring
- Monitoring Checklist
- Extracting JMX Metrics
- Deployment
- ksqlDB Containers
- Kafka Streams Containers
- Container Orchestration
- Operations
- Resetting a Kafka Streams Application
- Rate-Limiting the Output of Your Application
- Upgrading Kafka Streams
- Upgrading ksqlDB
- Summary
- Testing
- A. Kafka Streams Configuration
- Configuration Management
- Configuration Properties
- Consumer-Specific Configurations
- B. ksqlDB Configuration
- Query Configurations
- Server Configurations
- Security Configurations
- Index