Using Flume. Flexible, Scalable, and Reliable Data Streaming - Helion

ebook

Autor: Hari Shreedharan
ISBN: 978-14-919-0533-3
stron: 238, Format: ebook
Data wydania: 2014-09-16
Księgarnia: Helion

Cena książki: 126,65 zł (poprzednio: 147,27 zł)
Oszczędzasz: 14% (-20,62 zł)

Osoby, które kupiły tę książkę, wybierały także »

How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic Search, and other systems.

Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You’ll learn about Flume’s design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub.

Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers
Dive into key Flume components, including sources that accept data and sinks that write and deliver it
Write custom plugins to customize the way Flume receives, modifies, formats, and writes data
Explore APIs for sending data to Flume agents from your own applications
Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running

Osoby które kupowały "Using Flume. Flexible, Scalable, and Reliable Data Streaming", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)

Spis treści

Using Flume. Flexible, Scalable, and Reliable Data Streaming eBook -- spis treści

Foreword
Preface
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
1. Apache Hadoop and Apache HBase: An Introduction
- HDFS
  - HDFS Data Formats
  - Processing Data on HDFS
- Apache HBase
- Summary
- References
2. Streaming Data Using Apache Flume
- The Need for Flume
- Is Flume a Good Fit?
- Inside a Flume Agent
- Configuring Flume Agents
- Getting Flume Agents to Talk to Each Other
- Complex Flows
- Replicating Data to Various Destinations
- Dynamic Routing
- Flumes No Data Loss Guarantee, Channels, and Transactions
  - Transactions in Flume Channels
- Agent Failure and Data Loss
- The Importance of Batching
- What About Duplicates?
- Running a Flume Agent
- Summary
- References
3. Sources
- Lifecycle of a Source
- Sink-to-Source Communication
  - Avro Source
  - Thrift Source
  - Failure Handling in RPC Sources
- HTTP Source
  - Writing Handlers for the HTTP Source*
- Spooling Directory Source
  - Reading Custom Formats Using Deserializers*
  - Spooling Directory Source Performance
- Syslog Sources
- Exec Source
- JMS Source
  - Converting JMS Messages into Flume Events*
- Writing Your Own Sources*
  - Event-Driven and Pollable Sources
    - Developing pollable sources
    - Building event-driven sources
- Summary
- References
4. Channels
- Transaction Workflow
- Channels Bundled with Flume
  - Memory Channel
  - File Channel
    - Design and implementation of the File Channel*
- Summary
- References
5. Sinks
- Lifecycle of a Sink
- Optimizing the Performance of Sinks
- Writing to HDFS: The HDFS Sink
  - Understanding Buckets
  - Configuring the HDFS Sink
  - Controlling the Data Format Using Serializers*
- HBase Sinks
  - Translating Flume Events to HBase Puts and Increments Using Serializers*
- RPC Sinks
  - Avro Sink
  - Thrift Sink
- Morphline Solr Sink
- Elastic Search Sink
  - Customizing the Data Format*
- Other Sinks: Null Sink, Rolling File Sink, Logger Sink
- Writing Your Own Sink*
- Summary
- References
6. Interceptors, Channel Selectors, Sink Groups, and Sink Processors
- Interceptors
  - Timestamp Interceptor
  - Host Interceptor
  - Static Interceptor
  - Regex Filtering Interceptor
  - Morphline Interceptor
  - UUID Interceptor
  - Writing Interceptors*
- Channel Selectors
  - Replicating Channel Selector
  - Multiplexing Channel Selector
  - Custom Channel Selectors*
- Sink Groups and Sink Processors
  - Load-Balancing Sink Processor
    - Writing sink selectors*
  - Failover Sink Processor
- Summary
- References
7. Getting Data into Flume*
- Building Flume Events
- Flume Client SDK
  - Building Flume RPC Clients
  - RPC Client Interface
  - Configuration Parameters Common to All RPC Clients
  - Default RPC Client
  - Load-Balancing RPC Client
    - Writing your own host selector*
  - Failover RPC Client
  - Thrift RPC Client
- Embedded Agent
  - Configuring an Embedded Agent
- log4j Appenders
  - Load-Balancing log4j Appender
- Summary
- References
8. Planning, Deploying, and Monitoring Flume
- Planning a Flume Deployment
  - Time to Repair
  - How Much Capacity Do I Need in My Flume Channels?
  - How Many Tiers?
    - How do you know if Flume is not scaling or if the destination storage system or index is slow?
  - Sending Data over CrossData Center Links
  - Sharding Tiers
- Deploying Flume
  - Deploying Custom Code
- Monitoring Flume
  - Reporting Metrics from Custom Components
- Summary
- References
Index