Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting - Helion

ebook

Autor: P. Taylor Goetz, Peter T Goetz, Brian O'Neill
Tytuł oryginału: Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting
ISBN: 9781782168300
stron: 336, Format: ebook
Data wydania: 2014-03-26
Księgarnia: Helion

Cena książki: 159,00 zł

Osoby, które kupiły tę książkę, wybierały także »

Osoby które kupowały "Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)

Spis treści

Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting in a sound understanding of the fundamentals eBook -- spis treści

Storm Blueprints: Patterns for Distributed Real-time Computation
- Table of Contents
- Storm Blueprints: Patterns for Distributed Real-time Computation
- Credits
- About the Authors
- About the Reviewers
- www.PacktPub.com
  - Support files, eBooks, discount offers and more
    - Why Subscribe?
    - Free Access for Packt account holders
- Preface
  - What this book covers
  - What you need for this book
  - Who this book is for
  - Conventions
  - Reader feedback
  - Customer support
    - Downloading the example code
    - Errata
    - Piracy
    - Questions
- 1. Distributed Word Count
  - Introducing elements of a Storm topology streams, spouts, and bolts
    - Streams
    - Spouts
    - Bolts
  - Introducing the word count topology data flow
    - Sentence spout
      - Introducing the split sentence bolt
      - Introducing the word count bolt
      - Introducing the report bolt
  - Implementing the word count topology
    - Setting up a development environment
    - Implementing the sentence spout
    - Implementing the split sentence bolt
    - Implementing the word count bolt
    - Implementing the report bolt
    - Implementing the word count topology
  - Introducing parallelism in Storm
    - WordCountTopology parallelism
      - Adding workers to a topology
      - Configuring executors and tasks
  - Understanding stream groupings
  - Guaranteed processing
    - Reliability in spouts
    - Reliability in bolts
    - Reliable word count
  - Summary
- 2. Configuring Storm Clusters
  - Introducing the anatomy of a Storm cluster
    - Understanding the nimbus daemon
    - Working with the supervisor daemon
    - Introducing Apache ZooKeeper
    - Working with Storms DRPC server
    - Introducing the Storm UI
  - Introducing the Storm technology stack
    - Java and Clojure
    - Python
  - Installing Storm on Linux
    - Installing the base operating system
    - Installing Java
    - ZooKeeper installation
    - Storm installation
    - Running the Storm daemons
    - Configuring Storm
    - Mandatory settings
    - Optional settings
    - The Storm executable
    - Setting up the Storm executable on a workstation
    - The daemon commands
      - Nimbus
      - Supervisor
      - UI
      - DRPC
    - The management commands
      - Jar
      - Kill
      - Deactivate
      - Activate
      - Rebalance
      - Remoteconfvalue
    - Local debug/development commands
      - REPL
      - Classpath
      - Localconfvalue
  - Submitting topologies to a Storm cluster
  - Automating the cluster configuration
  - A rapid introduction to Puppet
    - Puppet manifests
    - Puppet classes and modules
    - Puppet templates
    - Managing environments with Puppet Hiera
    - Introducing Hiera
  - Summary
- 3. Trident Topologies and Sensor Data
  - Examining our use case
  - Introducing Trident topologies
  - Introducing Trident spouts
  - Introducing Trident operations filters and functions
    - Introducing Trident filters
    - Introducing Trident functions
  - Introducing Trident aggregators Combiners and Reducers
    - CombinerAggregator
    - ReducerAggregator
    - Aggregator
  - Introducing the Trident state
    - The Repeat Transactional state
    - The Opaque state
  - Executing the topology
  - Summary
- 4. Real-time Trend Analysis
  - Use case
  - Architecture
    - The source application
    - The logback Kafka appender
    - Apache Kafka
    - Kafka spout
    - The XMPP server
  - Installing the required software
    - Installing Kafka
    - Installing OpenFire
  - Introducing the sample application
    - Sending log messages to Kafka
  - Introducing the log analysis topology
    - Kafka spout
    - The JSON project function
    - Calculating a moving average
    - Adding a sliding window
    - Implementing the moving average function
    - Filtering on thresholds
    - Sending notifications with XMPP
  - The final topology
  - Running the log analysis topology
  - Summary
- 5. Real-time Graph Analysis
  - Use case
  - Architecture
    - The Twitter client
    - Kafka spout
    - A titan-distributed graph database
  - A brief introduction to graph databases
    - Accessing the graph the TinkerPop stack
    - Manipulating the graph with the Blueprints API
    - Manipulating the graph with the Gremlin shell
  - Software installation
    - Titan installation
  - Setting up Titan to use the Cassandra storage backend
    - Installing Cassandra
    - Starting Titan with the Cassandra backend
  - Graph data model
  - Connecting to the Twitter stream
    - Setting up the Twitter4J client
    - The OAuth configuration
      - The TwitterStreamConsumer class
      - The TwitterStatusListener class
  - Twitter graph topology
    - The JSONProjectFunction class
  - Implementing GraphState
    - GraphFactory
    - GraphTupleProcessor
    - GraphStateFactory
    - GraphState
    - GraphUpdater
  - Implementing GraphFactory
  - Implementing GraphTupleProcessor
  - Putting it all together the TwitterGraphTopology class
    - The TwitterGraphTopology class
  - Querying the graph with Gremlin
  - Summary
- 6. Artificial Intelligence
  - Designing for our use case
  - Establishing the architecture
    - Examining the design challenges
    - Implementing the recursion
      - Accessing the function's return values
      - Immutable tuple field values
      - Upfront field declaration
      - Tuple acknowledgement in recursion
      - Output to multiple streams
      - Read-before-write
    - Solving the challenges
  - Implementing the architecture
    - The data model
    - Examining the recursive topology
    - The queue interaction
    - Functions and filters
    - Examining the Scoring Topology
      - Addressing read-before-write
        
        Distributed locking
        
        Retry when stale
        
        Executing the topology
      - Enumerating the game tree
    - Distributed Remote Procedure Call (DRPC)
      - Remote deployment
  - Summary
- 7. Integrating Druid for Financial Analytics
  - Use case
  - Integrating a non-transactional system
  - The topology
    - The spout
    - The filter
    - The state design
  - Implementing the architecture
    - DruidState
    - Implementing the StormFirehose object
    - Implementing the partition status in ZooKeeper
  - Executing the implementation
  - Examining the analytics
  - Summary
- 8. Natural Language Processing
  - Motivating a Lambda architecture
  - Examining our use case
  - Realizing a Lambda architecture
  - Designing the topology for our use case
  - Implementing the design
    - TwitterSpout/TweetEmitter
    - Functions
      - TweetSplitterFunction
      - WordFrequencyFunction
      - PersistenceFunction
  - Examining the analytics
  - Batch processing / historical analysis
  - Hadoop
    - An overview of MapReduce
    - The Druid setup
      - HadoopDruidIndexer
  - Summary
- 9. Deploying Storm on Hadoop for Advertising Analysis
  - Examining the use case
  - Establishing the architecture
    - Examining HDFS
    - Examining YARN
  - Configuring the infrastructure
    - The Hadoop infrastructure
    - Configuring HDFS
      - Configuring the NameNode
      - Configuring the DataNode
      - Configuring YARN
        
        Configuring the ResourceManager
      - Configuring the NodeManager
  - Deploying the analytics
    - Performing a batch analysis with the Pig infrastructure
    - Performing a real-time analysis with the Storm-YARN infrastructure
  - Performing the analytics
    - Executing the batch analysis
    - Executing real-time analysis
  - Deploying the topology
  - Executing the topology
  - Summary
- 10. Storm in the Cloud
  - Introducing Amazon Elastic Compute Cloud (EC2)
    - Setting up an AWS account
    - The AWS Management Console
      - Creating an SSH key pair
    - Launching an EC2 instance manually
      - Logging in to the EC2 instance
  - Introducing Apache Whirr
    - Installing Whirr
  - Configuring a Storm cluster with Whirr
    - Launching the cluster
  - Introducing Whirr Storm
    - Setting up Whirr Storm
      - Cluster configuration
      - Customizing Storm's configuration
      - Customizing firewall rules
  - Introducing Vagrant
    - Installing Vagrant
    - Launching your first virtual machine
      - The Vagrantfile and shared filesystem
      - Vagrant provisioning
      - Configuring multimachine clusters with Vagrant
  - Creating Storm-provisioning scripts
    - ZooKeeper
    - Storm
    - Supervisord
      - The Storm Vagrantfile
      - Launching the Storm cluster
  - Summary
- Index