Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting - Helion
ebook
Autor: P. Taylor Goetz, Peter T Goetz, Brian O'NeillTytuÅ‚ oryginaÅ‚u: Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting
ISBN: 9781782168300
stron: 336, Format: ebook
Data wydania: 2014-03-26
Księgarnia: Helion
Cena książki: 159,00 zł
Osoby które kupowaÅ‚y "Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting", wybieraÅ‚y także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting in a sound understanding of the fundamentals eBook -- spis treÅ›ci
- Storm Blueprints: Patterns for Distributed Real-time Computation
- Table of Contents
- Storm Blueprints: Patterns for Distributed Real-time Computation
- Credits
- About the Authors
- About the Reviewers
- www.PacktPub.com
- Support files, eBooks, discount offers and more
- Why Subscribe?
- Free Access for Packt account holders
- Support files, eBooks, discount offers and more
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Errata
- Piracy
- Questions
- 1. Distributed Word Count
- Introducing elements of a Storm topology streams, spouts, and bolts
- Streams
- Spouts
- Bolts
- Introducing the word count topology data flow
- Sentence spout
- Introducing the split sentence bolt
- Introducing the word count bolt
- Introducing the report bolt
- Sentence spout
- Implementing the word count topology
- Setting up a development environment
- Implementing the sentence spout
- Implementing the split sentence bolt
- Implementing the word count bolt
- Implementing the report bolt
- Implementing the word count topology
- Introducing parallelism in Storm
- WordCountTopology parallelism
- Adding workers to a topology
- Configuring executors and tasks
- WordCountTopology parallelism
- Understanding stream groupings
- Guaranteed processing
- Reliability in spouts
- Reliability in bolts
- Reliable word count
- Summary
- Introducing elements of a Storm topology streams, spouts, and bolts
- 2. Configuring Storm Clusters
- Introducing the anatomy of a Storm cluster
- Understanding the nimbus daemon
- Working with the supervisor daemon
- Introducing Apache ZooKeeper
- Working with Storms DRPC server
- Introducing the Storm UI
- Introducing the Storm technology stack
- Java and Clojure
- Python
- Installing Storm on Linux
- Installing the base operating system
- Installing Java
- ZooKeeper installation
- Storm installation
- Running the Storm daemons
- Configuring Storm
- Mandatory settings
- Optional settings
- The Storm executable
- Setting up the Storm executable on a workstation
- The daemon commands
- Nimbus
- Supervisor
- UI
- DRPC
- The management commands
- Jar
- Kill
- Deactivate
- Activate
- Rebalance
- Remoteconfvalue
- Local debug/development commands
- REPL
- Classpath
- Localconfvalue
- Submitting topologies to a Storm cluster
- Automating the cluster configuration
- A rapid introduction to Puppet
- Puppet manifests
- Puppet classes and modules
- Puppet templates
- Managing environments with Puppet Hiera
- Introducing Hiera
- Summary
- Introducing the anatomy of a Storm cluster
- 3. Trident Topologies and Sensor Data
- Examining our use case
- Introducing Trident topologies
- Introducing Trident spouts
- Introducing Trident operations filters and functions
- Introducing Trident filters
- Introducing Trident functions
- Introducing Trident aggregators Combiners and Reducers
- CombinerAggregator
- ReducerAggregator
- Aggregator
- Introducing the Trident state
- The Repeat Transactional state
- The Opaque state
- Executing the topology
- Summary
- 4. Real-time Trend Analysis
- Use case
- Architecture
- The source application
- The logback Kafka appender
- Apache Kafka
- Kafka spout
- The XMPP server
- Installing the required software
- Installing Kafka
- Installing OpenFire
- Introducing the sample application
- Sending log messages to Kafka
- Introducing the log analysis topology
- Kafka spout
- The JSON project function
- Calculating a moving average
- Adding a sliding window
- Implementing the moving average function
- Filtering on thresholds
- Sending notifications with XMPP
- The final topology
- Running the log analysis topology
- Summary
- 5. Real-time Graph Analysis
- Use case
- Architecture
- The Twitter client
- Kafka spout
- A titan-distributed graph database
- A brief introduction to graph databases
- Accessing the graph the TinkerPop stack
- Manipulating the graph with the Blueprints API
- Manipulating the graph with the Gremlin shell
- Software installation
- Titan installation
- Setting up Titan to use the Cassandra storage backend
- Installing Cassandra
- Starting Titan with the Cassandra backend
- Graph data model
- Connecting to the Twitter stream
- Setting up the Twitter4J client
- The OAuth configuration
- The TwitterStreamConsumer class
- The TwitterStatusListener class
- Twitter graph topology
- The JSONProjectFunction class
- Implementing GraphState
- GraphFactory
- GraphTupleProcessor
- GraphStateFactory
- GraphState
- GraphUpdater
- Implementing GraphFactory
- Implementing GraphTupleProcessor
- Putting it all together the TwitterGraphTopology class
- The TwitterGraphTopology class
- Querying the graph with Gremlin
- Summary
- 6. Artificial Intelligence
- Designing for our use case
- Establishing the architecture
- Examining the design challenges
- Implementing the recursion
- Accessing the function's return values
- Immutable tuple field values
- Upfront field declaration
- Tuple acknowledgement in recursion
- Output to multiple streams
- Read-before-write
- Solving the challenges
- Implementing the architecture
- The data model
- Examining the recursive topology
- The queue interaction
- Functions and filters
- Examining the Scoring Topology
- Addressing read-before-write
- Distributed locking
- Retry when stale
- Executing the topology
- Enumerating the game tree
- Addressing read-before-write
- Distributed Remote Procedure Call (DRPC)
- Remote deployment
- Summary
- 7. Integrating Druid for Financial Analytics
- Use case
- Integrating a non-transactional system
- The topology
- The spout
- The filter
- The state design
- Implementing the architecture
- DruidState
- Implementing the StormFirehose object
- Implementing the partition status in ZooKeeper
- Executing the implementation
- Examining the analytics
- Summary
- 8. Natural Language Processing
- Motivating a Lambda architecture
- Examining our use case
- Realizing a Lambda architecture
- Designing the topology for our use case
- Implementing the design
- TwitterSpout/TweetEmitter
- Functions
- TweetSplitterFunction
- WordFrequencyFunction
- PersistenceFunction
- Examining the analytics
- Batch processing / historical analysis
- Hadoop
- An overview of MapReduce
- The Druid setup
- HadoopDruidIndexer
- Summary
- 9. Deploying Storm on Hadoop for Advertising Analysis
- Examining the use case
- Establishing the architecture
- Examining HDFS
- Examining YARN
- Configuring the infrastructure
- The Hadoop infrastructure
- Configuring HDFS
- Configuring the NameNode
- Configuring the DataNode
- Configuring YARN
- Configuring the ResourceManager
- Configuring the NodeManager
- Deploying the analytics
- Performing a batch analysis with the Pig infrastructure
- Performing a real-time analysis with the Storm-YARN infrastructure
- Performing the analytics
- Executing the batch analysis
- Executing real-time analysis
- Deploying the topology
- Executing the topology
- Summary
- 10. Storm in the Cloud
- Introducing Amazon Elastic Compute Cloud (EC2)
- Setting up an AWS account
- The AWS Management Console
- Creating an SSH key pair
- Launching an EC2 instance manually
- Logging in to the EC2 instance
- Introducing Apache Whirr
- Installing Whirr
- Configuring a Storm cluster with Whirr
- Launching the cluster
- Introducing Whirr Storm
- Setting up Whirr Storm
- Cluster configuration
- Customizing Storm's configuration
- Customizing firewall rules
- Setting up Whirr Storm
- Introducing Vagrant
- Installing Vagrant
- Launching your first virtual machine
- The Vagrantfile and shared filesystem
- Vagrant provisioning
- Configuring multimachine clusters with Vagrant
- Creating Storm-provisioning scripts
- ZooKeeper
- Storm
- Supervisord
- The Storm Vagrantfile
- Launching the Storm cluster
- Summary
- Introducing Amazon Elastic Compute Cloud (EC2)
- Index