Kafka Connect - Helion
ISBN: 9781098126490
stron: 402, Format: ebook
Data wydania: 2023-09-18
Księgarnia: Helion
Cena książki: 254,15 zł (poprzednio: 299,00 zł)
Oszczędzasz: 15% (-44,85 zł)
Used by more than 80% of Fortune 100 companies, Apache Kafka has become the de facto event streaming platform. Kafka Connect is a key component of Kafka that lets you flow data between your existing systems and Kafka to process data in real time.
With this practical guide, authors Mickael Maison and Kate Stanley show data engineers, site reliability engineers, and application developers how to build data pipelines between Kafka clusters and a variety of data sources and sinks. Kafka Connect allows you to quickly adopt Kafka by tapping into existing data and enabling many advanced use cases. No matter where you are in your event streaming journey, Kafka Connect is the ideal tool for building a modern data pipeline.
- Learn Kafka Connect's capabilities, main concepts, and terminology
- Design data and event streaming pipelines that use Kafka Connect
- Configure and operate Kafka Connect environments at scale
- Deploy secured and highly available Kafka Connect clusters
- Build sink and source connectors and single message transforms and converters
Osoby które kupowały "Kafka Connect", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Kafka Connect eBook -- spis treści
- Foreword
- Preface
- Who Should Read This Book
- Kafka Versions
- Navigating This Book
- Conventions Used in This Book
- OReilly Online Learning
- How to Contact Us
- Acknowledgements
- I. Introduction to Kafka Connect
- 1. Meet Kafka Connect
- Kafka Connect Features
- Pluggable Architecture
- Scalability and Reliability
- Declarative Pipeline Definition
- Part of Apache Kafka
- Use Cases
- Capturing Database Changes
- Mirroring Kafka Clusters
- Building Data Lakes
- Aggregating Logs
- Modernizing Legacy Systems
- Alternatives to Kafka Connect
- Summary
- Kafka Connect Features
- 2. Apache Kafka Basics
- A Distributed Event Streaming Platform
- Open Source
- Distributed
- Event Streaming
- Platform
- Kafka Concepts
- Publish-Subscribe
- Brokers and Records
- Topics and Partitions
- Replication
- Retention and Compaction
- KRaft and ZooKeeper
- Interacting with Kafka
- Producers
- Consumers
- Kafka Streams
- Getting Started with Kafka
- Starting Kafka
- Kafka in KRaft mode (without ZooKeeper)
- Kafka with ZooKeeper
- Sending and Receiving Records
- Running a Kafka Streams Application
- Starting Kafka
- Summary
- A Distributed Event Streaming Platform
- II. Developing Data Pipelines with Kafka Connect
- 3. Components in a Kafka Connect Data Pipeline
- Kafka Connect Runtime
- Running Kafka Connect
- Kafka Connect REST API
- Installing Plug-Ins
- Deployment Modes
- Source and Sink Connectors
- Connectors and Tasks
- Configuring Connectors
- Running Connectors
- Converters
- Data Format and Schemas
- Configuring Converters
- Using Converters
- Transformations and Predicates
- Transformation Use Cases
- Routing
- Sanitizing
- Formatting
- Enhancing
- Predicates
- Configuring Transformations and Predicates
- Using Transformations and Predicates
- Transformation Use Cases
- Summary
- Kafka Connect Runtime
- 4. Designing Effective Data Pipelines
- Choosing a Connector
- Pipeline Direction
- Licensing and Support
- Connector Features
- Defining Data Models
- Data Transformation
- Mapping Data Between Systems
- Formatting Data
- Data Formats
- Schemas
- Kafka Connect record schemas
- Kafka record schemas
- Exploring Kafka Connect Internals
- Internal Topics
- Group Membership
- Rebalance Protocols
- Handling Failures in Kafka Connect
- Worker Failure
- Connector/Task Failure
- Kafka/External Systems Failure
- Dead Letter Queues
- Understanding Processing Semantics
- Sink Connectors
- Source Connectors
- Summary
- Choosing a Connector
- 5. Connectors in Action
- Confluent S3 Sink Connector
- Configuring the Connector
- Connectivity and S3 details
- Object partitioning
- Object naming
- Object formats
- Object upload
- Exactly-Once Semantics
- Running the Connector
- Using the field partitioner
- Using the time-based partitioner
- Configuring the Connector
- Confluent JDBC Source Connector
- Configuring the Connector
- Connectivity
- Topic naming
- Table filtering
- Data collection mode
- Partitioning and parallelism
- Running the Connector
- Using the bulk mode
- Using an incrementing mode
- Configuring the Connector
- Debezium MySQL Source Connector
- Configuring the Connector
- Connectivity
- Database and table filtering
- Snapshotting
- Event Formats
- Running the Connector
- Configuring the Connector
- Summary
- Confluent S3 Sink Connector
- 6. Mirroring Clusters with MirrorMaker
- Introduction to Mirroring
- Exploring Mirroring Use Cases
- Geo-replication
- Disaster recovery
- Migration
- Complex topologies
- Mirroring in Practice
- Exploring Mirroring Use Cases
- Introduction to MirrorMaker
- Common Concepts
- Local and remote topics
- Common configurations
- Replication policies
- Client overrides
- Deployment Modes
- Common Concepts
- MirrorMaker Connectors
- MirrorSourceConnector
- Configurations
- Topic configurations
- Offset-syncs configurations
- ACLs configurations
- Metrics configurations
- Permissions
- Source cluster ACLs
- Target cluster ACLs
- Metrics
- Configurations
- MirrorCheckpointConnector
- Configurations
- Permissions
- Source cluster ACLs
- Target cluster ACLs
- Metrics
- MirrorHeartbeatConnector
- Configurations
- Permissions
- MirrorSourceConnector
- Running MirrorMaker
- Disaster Recovery Example
- Geo-Replication Example
- Summary
- Introduction to Mirroring
- III. Running Kafka Connect in Production
- 7. Deploying and Operating Kafka Connect Clusters
- Preparing the Kafka Connect Environment
- Building a Kafka Connect Environment
- Installing Plug-Ins
- Networking and Permissions
- Worker Plug-Ins
- Configuration Providers
- REST Extensions
- Connector Client Configuration Override Policies
- Sizing and Planning Capacity
- Understanding Kafka Connect Resource Utilization
- How Many Workers and Tasks?
- Single cluster versus separate clusters
- Maintainability
- Isolation
- Security
- Use case optimization
- Operating Kafka Connect Clusters
- Adding Workers
- Removing Workers
- Upgrading and Applying Maintenance to Workers
- Restarting Failed Tasks and Connectors
- Resetting Offsets of Connectors
- Sink connector offsets
- Source connector offsets
- Administering Kafka Connect Using the REST API
- Creating and Deleting a Connector
- Connector and Task Configuration
- Controlling the Lifecycle of Connectors
- Listing Connector Offsets
- Debugging Issues
- Summary
- Preparing the Kafka Connect Environment
- 8. Configuring Kafka Connect
- Configuring the Runtime
- Configurations for Production
- Clients and connector overrides
- REST configurations
- Miscellaneous configuration
- Fine-Tuning Configurations
- Connection configurations
- Inter-worker and rebalance configurations
- Topic tracking configurations
- Metrics configurations
- Offset flush configurations
- Configurations for Production
- Configuring Connectors
- Topic Configurations
- Client Overrides
- Configurations for Exactly-Once
- Configurations for Error Handling
- Configuring Kafka Connect Clusters for Security
- Securing the Connection to Kafka
- TLS configurations
- SASL configurations
- SASL OAUTHBEARER configurations
- SASL GSSAPI configurations
- Configuring Permissions
- Securing the REST API
- Securing the Connection to Kafka
- Summary
- Configuring the Runtime
- 9. Monitoring Kafka Connect
- Monitoring Logs
- Logging Configuration
- Understanding Startup Logs
- Analyzing Logs
- Log contexts
- Key events
- Errors
- Monitoring Metrics
- Metrics Reporters
- Analyzing Metrics
- Exploring Metrics
- Key Metrics
- Kafka Connect Runtime Metrics
- Metadata metrics
- Network metrics
- Group protocol metrics
- Connector-level metrics
- Task-level metrics
- Other System Metrics
- Internal Kafka client metrics
- Kafka and external system metrics
- Kafka Connect Runtime Metrics
- Summary
- Monitoring Logs
- 10. Administering Kafka Connect on Kubernetes
- Introduction to Kubernetes
- Virtualization Technologies
- Kubernetes Fundamentals
- Running Kafka Connect on Kubernetes
- Container Image
- Deploying Workers
- Networking and Monitoring
- Configuration
- Using a Kubernetes Operator to Deploy Kafka Connect
- Introduction to Kubernetes Operators
- Kubernetes Operators for Kafka Connect
- Strimzi
- Getting a Kubernetes Environment
- Starting the Operator
- Kafka Connect CRDs
- Deploying a Kafka Connect Cluster and Connectors
- MirrorMaker CRD
- Summary
- Introduction to Kubernetes
- IV. Building Custom Connectors and Plug-Ins
- 11. Building Source and Sink Connectors
- Common Concepts and APIs
- Building a Custom Connector
- Implementing a connector
- Packaging a connector
- The Connector API
- The version() method
- The config() method
- The initialize() method
- The start() method
- The taskClass() method
- The taskConfigs() method
- The stop() method
- The validate() method
- The context() methods
- Connector API lifecycle
- Configurations
- Configuration types
- Validators and recommenders
- Interacting with configurations at runtime
- The Task API
- The initialize() methods
- The start() method
- The stop() method
- Task API lifecycle
- Kafka Connect Records
- Schemas
- The ConnectorContext API
- The requestTaskReconfiguration() method
- The raiseError() method
- The configs() method
- Building a Custom Connector
- Implementing Source Connectors
- The SourceTask API
- The poll() method
- The commit() and commitRecord() methods
- SourceTask API lifecycle
- Source Records
- The SourceConnectorContext and SourceTaskContext APIs
- The offsetStorageReader() method
- The transactionContext() method
- Exactly-Once Support
- The exactlyOnceSupport() method
- The canDefineTransactionBoundaries() method
- The commitTransaction() methods
- The abortTransaction() methods
- The SourceTask API
- Implementing Sink Connectors
- The SinkTask API
- The put() method
- The preCommit() method
- The flush() method
- The open() and close() methods
- The SinkTask API lifecycle
- Sink Records
- The SinkConnectorContext and SinkTaskContext APIs
- The offset() methods
- The timeout() method
- The assignment() method
- The pause() and resume() methods
- The requestCommit() method
- The errantRecordReporter() method
- The SinkTask API
- Summary
- Common Concepts and APIs
- 12. Extending Kafka Connect with Connector and Worker Plug-Ins
- Implementing Connector Plug-Ins
- The Transformation API
- The apply() method
- The config() method
- The configure() method
- The close() method
- The Predicate API
- The test() method
- The config() method
- The configure() method
- The close() method
- The Converter and HeaderConverter APIs
- The fromConnectData() methods
- The toConnectData() methods
- The fromConnectHeader() method
- The toConnectHeader() method
- The config() methods
- The configure() methods
- The close() method
- The Transformation API
- Implementing Worker Plug-Ins
- The ConfigProvider API
- The get() methods
- The configure() method
- The close() method
- The subscribe(), unsubscribe(), and unsubscribeAll() methods
- The ConnectorClientConfigOverridePolicy API
- The validate() method
- The configure() method
- The close() method
- The ConnectRestExtension APIs
- The register() method
- The configure() method
- The close() method
- The version() method
- The ConfigProvider API
- Summary
- Implementing Connector Plug-Ins
- Index