Kafka: The Definitive Guide. 2nd Edition - Helion

ebook

Autor: Gwen Shapira, Todd Palino, Rajini Sivaram
ISBN: 9781492043034
stron: 488, Format: ebook
Data wydania: 2021-11-05
Księgarnia: Helion

Cena książki: 237,15 zł (poprzednio: 275,76 zł)
Oszczędzasz: 14% (-38,61 zł)

Osoby, które kupiły tę książkę, wybierały także »

Tagi: Inne - Programowanie

Every enterprise application creates data, whether it consists of log messages, metrics, user activity, or outgoing messages. Moving all this data is just as important as the data itself. With this updated edition, application architects, developers, and production engineers new to the Kafka streaming platform will learn how to handle data in motion. Additional chapters cover Kafka's AdminClient API, transactions, new security features, and tooling changes.

Engineers from Confluent and LinkedIn responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream processing applications with this platform. Through detailed examples, you'll learn Kafka's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer.

You'll examine:

Best practices for deploying and configuring Kafka
Kafka producers and consumers for writing and reading messages
Patterns and use-case requirements to ensure reliable data delivery
Best practices for building data pipelines and applications with Kafka
How to perform monitoring, tuning, and maintenance tasks with Kafka in production
The most critical metrics among Kafka's operational measurements
Kafka's delivery capabilities for stream processing systems

Osoby które kupowały "Kafka: The Definitive Guide. 2nd Edition", wybierały także:

Superinteligencja. Scenariusze, strategie, zagro 66,19 zł, (13,90 zł -79%)
Poradnik design thinking - czyli jak wykorzysta 49,64 zł, (13,90 zł -72%)
F# 4.0 dla zaawansowanych. Wydanie IV 96,45 zł, (29,90 zł -69%)
Systemy reaktywne. Wzorce projektowe i ich stosowanie 65,31 zł, (20,90 zł -68%)
GameMaker. Kurs video. Kompleksowy przewodnik tworzenia gier platformowych 154,58 zł, (55,65 zł -64%)

Spis treści

Kafka: The Definitive Guide. 2nd Edition eBook -- spis treści

Foreword to the Second Edition
Foreword to the First Edition
Preface
- Who Should Read This Book
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
1. Meet Kafka
- Publish/Subscribe Messaging
  - How It Starts
  - Individual Queue Systems
- Enter Kafka
  - Messages and Batches
  - Schemas
  - Topics and Partitions
  - Producers and Consumers
  - Brokers and Clusters
  - Multiple Clusters
- Why Kafka?
  - Multiple Producers
  - Multiple Consumers
  - Disk-Based Retention
  - Scalable
  - High Performance
  - Platform Features
- The Data Ecosystem
  - Use Cases
    - Activity tracking
    - Messaging
    - Metrics and logging
    - Commit log
    - Stream processing
- Kafkas Origin
  - LinkedIns Problem
  - The Birth of Kafka
  - Open Source
  - Commercial Engagement
  - The Name
- Getting Started with Kafka
2. Installing Kafka
- Environment Setup
  - Choosing an Operating System
  - Installing Java
  - Installing ZooKeeper
    - Standalone server
    - ZooKeeper ensemble
- Installing a Kafka Broker
- Configuring the Broker
  - General Broker Parameters
    - broker.id
    - listeners
    - zookeeper.connect
    - log.dirs
    - num.recovery.threads.per.data.dir
    - auto.create.topics.enable
    - auto.leader.rebalance.enable
    - delete.topic.enable
  - Topic Defaults
    - num.partitions
    - default.replication.factor
    - log.retention.ms
    - log.retention.bytes
    - log.segment.bytes
    - log.roll.ms
    - min.insync.replicas
    - message.max.bytes
- Selecting Hardware
  - Disk Throughput
  - Disk Capacity
  - Memory
  - Networking
  - CPU
- Kafka in the Cloud
  - Microsoft Azure
  - Amazon Web Services
- Configuring Kafka Clusters
  - How Many Brokers?
  - Broker Configuration
  - OS Tuning
    - Virtual memory
    - Disk
    - Networking
- Production Concerns
  - Garbage Collector Options
  - Datacenter Layout
  - Colocating Applications on ZooKeeper
- Summary
3. Kafka Producers: Writing Messages to Kafka
- Producer Overview
- Constructing a Kafka Producer
- Sending a Message to Kafka
  - Sending a Message Synchronously
  - Sending a Message Asynchronously
- Configuring Producers
  - client.id
  - acks
  - Message Delivery Time
    - max.block.ms
    - delivery.timeout.ms
    - request.timeout.ms
    - retries and retry.backoff.ms
  - linger.ms
  - buffer.memory
  - compression.type
  - batch.size
  - max.in.flight.requests.per.connection
  - max.request.size
  - receive.buffer.bytes and send.buffer.bytes
  - enable.idempotence
- Serializers
  - Custom Serializers
  - Serializing Using Apache Avro
  - Using Avro Records with Kafka
- Partitions
  - Implementing a custom partitioning strategy
- Headers
- Interceptors
- Quotas and Throttling
- Summary
4. Kafka Consumers: Reading Data from Kafka
- Kafka Consumer Concepts
  - Consumers and Consumer Groups
  - Consumer Groups and Partition Rebalance
  - Static Group Membership
- Creating a Kafka Consumer
- Subscribing to Topics
- The Poll Loop
  - Thread Safety
- Configuring Consumers
  - fetch.min.bytes
  - fetch.max.wait.ms
  - fetch.max.bytes
  - max.poll.records
  - max.partition.fetch.bytes
  - session.timeout.ms and heartbeat.interval.ms
  - max.poll.interval.ms
  - default.api.timeout.ms
  - request.timeout.ms
  - auto.offset.reset
  - enable.auto.commit
  - partition.assignment.strategy
  - client.id
  - client.rack
  - group.instance.id
  - receive.buffer.bytes and send.buffer.bytes
  - offsets.retention.minutes
- Commits and Offsets
  - Automatic Commit
  - Commit Current Offset
  - Asynchronous Commit
  - Combining Synchronous and Asynchronous Commits
  - Committing a Specified Offset
- Rebalance Listeners
- Consuming Records with Specific Offsets
- But How Do We Exit?
- Deserializers
  - Custom Deserializers
  - Using Avro Deserialization with Kafka Consumer
- Standalone Consumer: Why and How to Use a Consumer Without a Group
- Summary
5. Managing Apache Kafka Programmatically
- AdminClient Overview
  - Asynchronous and Eventually Consistent API
  - Options
  - Flat Hierarchy
  - Additional Notes
- AdminClient Lifecycle: Creating, Configuring, and Closing
  - client.dns.lookup
    - Use of a DNS alias
    - DNS name with multiple IP addresses
  - request.timeout.ms
- Essential Topic Management
- Configuration Management
- Consumer Group Management
  - Exploring Consumer Groups
  - Modifying Consumer Groups
- Cluster Metadata
- Advanced Admin Operations
  - Adding Partitions to a Topic
  - Deleting Records from a Topic
  - Leader Election
  - Reassigning Replicas
- Testing
- Summary
6. Kafka Internals
- Cluster Membership
- The Controller
  - KRaft: Kafkas New Raft-Based Controller
- Replication
- Request Processing
  - Produce Requests
  - Fetch Requests
  - Other Requests
- Physical Storage
  - Tiered Storage
  - Partition Allocation
  - File Management
  - File Format
  - Indexes
  - Compaction
  - How Compaction Works
  - Deleted Events
  - When Are Topics Compacted?
- Summary
7. Reliable Data Delivery
- Reliability Guarantees
- Replication
- Broker Configuration
  - Replication Factor
  - Unclean Leader Election
  - Minimum In-Sync Replicas
  - Keeping Replicas In Sync
  - Persisting to Disk
- Using Producers in a Reliable System
  - Send Acknowledgments
  - Configuring Producer Retries
  - Additional Error Handling
- Using Consumers in a Reliable System
  - Important Consumer Configuration Properties for Reliable Processing
  - Explicitly Committing Offsets in Consumers
    - Always commit offsets after messages were processed
    - Commit frequency is a trade-off between performance and number of duplicates in the event of a crash
    - Commit the right offsets at the right time
    - Rebalances
    - Consumers may need to retry
    - Consumers may need to maintain state
- Validating System Reliability
  - Validating Configuration
  - Validating Applications
  - Monitoring Reliability in Production
- Summary
8. Exactly-Once Semantics
- Idempotent Producer
  - How Does the Idempotent Producer Work?
    - Producer restart
    - Broker failure
  - Limitations of the Idempotent Producer
  - How Do I Use the Kafka Idempotent Producer?
- Transactions
  - Transactions Use Cases
  - What Problems Do Transactions Solve?
    - Reprocessing caused by application crashes
    - Reprocessing caused by zombie applications
  - How Do Transactions Guarantee Exactly-Once?
  - What Problems Arent Solved by Transactions?
    - Side effects while stream processing
    - Reading from a Kafka topic and writing to a database
    - Reading data from a database, writing to Kafka, and from there writing to another database
    - Copying data from one Kafka cluster to another
    - Publish/subscribe pattern
  - How Do I Use Transactions?
  - Transactional IDs and Fencing
  - How Transactions Work
- Performance of Transactions
- Summary
9. Building Data Pipelines
- Considerations When Building Data Pipelines
  - Timeliness
  - Reliability
  - High and Varying Throughput
  - Data Formats
  - Transformations
  - Security
  - Failure Handling
  - Coupling and Agility
- When to Use Kafka Connect Versus Producer and Consumer
- Kafka Connect
  - Running Kafka Connect
  - Connector Example: File Source and File Sink
  - Connector Example: MySQL to Elasticsearch
  - Single Message Transformations
  - A Deeper Look at Kafka Connect
    - Connectors and tasks
    - Workers
    - Converters and Connects data model
    - Offset management
- Alternatives to Kafka Connect
  - Ingest Frameworks for Other Datastores
  - GUI-Based ETL Tools
  - Stream Processing Frameworks
- Summary
10. Cross-Cluster Data Mirroring
- Use Cases of Cross-Cluster Mirroring
- Multicluster Architectures
  - Some Realities of Cross-Datacenter Communication
  - Hub-and-Spoke Architecture
  - Active-Active Architecture
  - Active-Standby Architecture
    - Disaster recovery planning
    - Data loss and inconsistencies in unplanned failover
    - Start offset for applications after failover
    - After the failover
    - A few words on cluster discovery
  - Stretch Clusters
- Apache Kafkas MirrorMaker
  - Configuring MirrorMaker
  - Multicluster Replication Topology
  - Securing MirrorMaker
  - Deploying MirrorMaker in Production
  - Tuning MirrorMaker
- Other Cross-Cluster Mirroring Solutions
  - Uber uReplicator
  - LinkedIn Brooklin
  - Confluent Cross-Datacenter Mirroring Solutions
- Summary
11. Securing Kafka
- Locking Down Kafka
- Security Protocols
- Authentication
  - SSL
    - Configuring TLS
    - Security considerations
  - SASL
    - SASL/GSSAPI
      - Configuring SASL/GSSAPI
      - Security considerations
    - SASL/PLAIN
      - Configuring SASL/PLAIN
      - Security considerations
    - SASL/SCRAM
      - Configuring SASL/SCRAM
      - Security considerations
    - SASL/OAUTHBEARER
      - Configuring SASL/OAUTHBEARER
      - Security considerations
    - Delegation tokens
      - Configuring delegation tokens
      - Security considerations
  - Reauthentication
  - Security Updates Without Downtime
- Encryption
  - End-to-End Encryption
- Authorization
  - AclAuthorizer
  - Customizing Authorization
  - Security Considerations
- Auditing
- Securing ZooKeeper
  - SASL
  - SSL
  - Authorization
- Securing the Platform
  - Password Protection
- Summary
12. Administering Kafka
- Topic Operations
  - Creating a New Topic
  - Listing All Topics in a Cluster
  - Describing Topic Details
  - Adding Partitions
  - Reducing Partitions
  - Deleting a Topic
- Consumer Groups
  - List and Describe Groups
  - Delete Group
  - Offset Management
    - Export offsets
    - Import offsets
- Dynamic Configuration Changes
  - Overriding Topic Configuration Defaults
  - Overriding Client and User Configuration Defaults
  - Overriding Broker Configuration Defaults
  - Describing Configuration Overrides
  - Removing Configuration Overrides
- Producing and Consuming
  - Console Producer
    - Using producer configuration options
    - Line-reader options
  - Console Consumer
    - Using consumer configuration options
    - Message formatter options
    - Consuming the offsets topics
- Partition Management
  - Preferred Replica Election
  - Changing a Partitions Replicas
    - Changing the replication factor
    - Canceling replica reassignments
  - Dumping Log Segments
  - Replica Verification
- Other Tools
- Unsafe Operations
  - Moving the Cluster Controller
  - Removing Topics to Be Deleted
  - Deleting Topics Manually
- Summary
13. Monitoring Kafka
- Metric Basics
  - Where Are the Metrics?
    - Nonapplication metrics
  - What Metrics Do I Need?
    - Alerting or debugging?
    - Automation or humans?
  - Application Health Checks
- Service-Level Objectives
  - Service-Level Definitions
  - What Metrics Make Good SLIs?
  - Using SLOs in Alerting
- Kafka Broker Metrics
  - Diagnosing Cluster Problems
  - The Art of Under-Replicated Partitions
    - Cluster-level problems
    - Host-level problems
  - Broker Metrics
    - Active controller count
    - Controller queue size
    - Request handler idle ratio
    - All topics bytes in
    - All topics bytes out
    - All topics messages in
    - Partition count
    - Leader count
    - Offline partitions
    - Request metrics
  - Topic and Partition Metrics
    - Per-topic metrics
    - Per-partition metrics
  - JVM Monitoring
    - Garbage collection
    - Java OS monitoring
  - OS Monitoring
  - Logging
- Client Monitoring
  - Producer Metrics
    - Overall producer metrics
    - Per-broker and per-topic metrics
  - Consumer Metrics
    - Fetch manager metrics
    - Per-broker and per-topic metrics
    - Consumer coordinator metrics
  - Quotas
- Lag Monitoring
- End-to-End Monitoring
- Summary
14. Stream Processing
- What Is Stream Processing?
- Stream Processing Concepts
  - Topology
  - Time
  - State
  - Stream-Table Duality
  - Time Windows
  - Processing Guarantees
- Stream Processing Design Patterns
  - Single-Event Processing
  - Processing with Local State
  - Multiphase Processing/Repartitioning
  - Processing with External Lookup: Stream-Table Join
  - Table-Table Join
  - Streaming Join
  - Out-of-Sequence Events
  - Reprocessing
  - Interactive Queries
- Kafka Streams by Example
  - Word Count
  - Stock Market Statistics
  - ClickStream Enrichment
- Kafka Streams: Architecture Overview
  - Building a Topology
  - Optimizing a Topology
  - Testing a Topology
  - Scaling a Topology
  - Surviving Failures
- Stream Processing Use Cases
- How to Choose a Stream Processing Framework
- Summary
A. Installing Kafka on Other Operating Systems
- Installing on Windows
  - Using Windows Subsystem for Linux
  - Using Native Java
- Installing on macOS
  - Using Homebrew
  - Installing Manually
B. Additional Kafka Tools
- Comprehensive Platforms
- Cluster Deployment and Management
- Monitoring and Data Exploration
- Client Libraries
- Stream Processing
Index