MapReduce Design Patterns. Building Effective Algorithms and Analytics for Hadoop and Other Systems - Helion
ISBN: 978-14-493-4198-5
stron: 250, Format: ebook
Data wydania: 2012-11-21
Księgarnia: Helion
Cena książki: 152,15 zł (poprzednio: 176,92 zł)
Oszczędzasz: 14% (-24,77 zł)
Until now, design patterns for the MapReduce framework have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you’re using.
Each pattern is explained in context, with pitfalls and caveats clearly identified to help you avoid common design mistakes when modeling your big data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop.
- Summarization patterns: get a top-level view by summarizing and grouping data
- Filtering patterns: view data subsets such as records generated from one user
- Data organization patterns: reorganize data to work with other systems, or to make MapReduce analysis easier
- Join patterns: analyze different datasets together to discover interesting relationships
- Metapatterns: piece together several patterns to solve multi-stage problems, or to perform several analytics in the same job
- Input and output patterns: customize the way you use Hadoop to load or store data
"A clear exposition of MapReduce programs for common data processing patterns—this book is indespensible for anyone using Hadoop."
--Tom White, author of Hadoop: The Definitive Guide
Osoby które kupowały "MapReduce Design Patterns. Building Effective Algorithms and Analytics for Hadoop and Other Systems", wybierały także:
- Algorytmy kryptograficzne. Przewodnik po algorytmach w blockchain, kryptografii kwantowej, protoko 79,00 zł, (39,50 zł -50%)
- Informatyk samouk. Przewodnik po strukturach danych i algorytmach dla pocz 58,98 zł, (29,49 zł -50%)
- My 89,00 zł, (44,50 zł -50%)
- Nauka algorytm 58,98 zł, (29,49 zł -50%)
- 40 algorytmów, które powinien znać każdy programista. Nauka implementacji algorytmów w Pythonie 77,00 zł, (38,50 zł -50%)
Spis treści
MapReduce Design Patterns. Building Effective Algorithms and Analytics for Hadoop and Other Systems eBook -- spis treści
- MapReduce Design Patterns
- Dedication
- Preface
- Intended Audience
- Pattern Format
- The Examples in This Book
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
- 1. Design Patterns and MapReduce
- Design Patterns
- MapReduce History
- MapReduce and Hadoop Refresher
- Hadoop Example: Word Count
- Pig and Hive
- 2. Summarization Patterns
- Numerical Summarizations
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Known uses
- Resemblances
- Performance analysis
- Numerical Summarization Examples
- Minimum, maximum, and count example
- MinMaxCountTuple code
- Mapper code
- Reducer code
- Combiner optimization
- Data flow diagram
- Average example
- Mapper code
- Reducer code
- Combiner optimization
- Data flow diagram
- Median and standard deviation
- Mapper code
- Reducer code
- Combiner optimization
- Memory-conscious median and standard deviation
- Mapper code
- Reducer code
- Combiner optimization
- Data flow diagram
- Minimum, maximum, and count example
- Pattern Description
- Inverted Index Summarizations
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Performance analysis
- Inverted Index Example
- Wikipedia reference inverted index
- Mapper code
- Reducer code
- Combiner optimization
- Wikipedia reference inverted index
- Pattern Description
- Counting with Counters
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Known uses
- Performance analysis
- Counting with Counters Example
- Number of users per state
- Mapper code
- Driver code
- Number of users per state
- Pattern Description
- Numerical Summarizations
- 3. Filtering Patterns
- Filtering
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Known uses
- Resemblances
- Performance analysis
- Filtering Examples
- Distributed grep
- Mapper code
- Simple Random Sampling
- Mapper Code
- Distributed grep
- Pattern Description
- Bloom Filtering
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Known uses
- Resemblances
- Performance analysis
- Bloom Filtering Examples
- Hot list
- Bloom filter training
- Mapper code
- HBase Query using a Bloom filter
- Mapper Code
- Hot list
- Pattern Description
- Top Ten
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Known uses
- Resemblances
- Performance analysis
- Top Ten Examples
- Top ten users by reputation
- Mapper code
- Reducer code
- Top ten users by reputation
- Pattern Description
- Distinct
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Known uses
- Resemblances
- Performance analysis
- Distinct Examples
- Distinct user IDs
- Mapper code
- Reducer code
- Combiner optimization
- Distinct user IDs
- Pattern Description
- Filtering
- 4. Data Organization Patterns
- Structured to Hierarchical
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Known uses
- Resemblances
- Performance analysis
- Structured to Hierarchical Examples
- Post/comment building on StackOverflow
- Driver code
- Mapper code
- Reducer code
- Question/answer building on StackOverflow
- Mapper code
- Reducer code
- Post/comment building on StackOverflow
- Pattern Description
- Partitioning
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Known uses
- Resemblances
- Performance analysis
- Partitioning Examples
- Partitioning users by last access date
- Driver code
- Mapper code
- Partitioner code
- Reducer code
- Partitioning users by last access date
- Pattern Description
- Binning
- Pattern Description
- Intent
- Motivation
- Structure
- Consequences
- Resemblances
- Performance analysis
- Binning Examples
- Binning by Hadoop-related tags
- Driver code
- Mapper code
- Binning by Hadoop-related tags
- Pattern Description
- Total Order Sorting
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Resemblances
- Performance analysis
- Total Order Sorting Examples
- Sort users by last visit
- Driver code
- Analyze mapper code
- Order mapper code
- Order reducer code
- Sort users by last visit
- Pattern Description
- Shuffling
- Pattern Description
- Intent
- Motivation
- Structure
- Consequences
- Resemblances
- Performance analysis
- Shuffle Examples
- Anonymizing StackOverflow comments
- Mapper code
- Reducer code
- Anonymizing StackOverflow comments
- Pattern Description
- Structured to Hierarchical
- 5. Join Patterns
- A Refresher on Joins
- Reduce Side Join
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Resemblances
- Performance analysis
- Reduce Side Join Example
- User and comment join
- Driver code
- User mapper code
- Comment mapper code
- Reducer code
- Combiner optimization
- User and comment join
- Reduce Side Join with Bloom Filter
- Reputable user and comment join
- User mapper code
- Comment mapper code
- Reputable user and comment join
- Pattern Description
- Replicated Join
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Resemblances
- Performance analysis
- Replicated Join Examples
- Replicated user comment example
- Mapper code
- Replicated user comment example
- Pattern Description
- Composite Join
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Performance analysis
- Composite Join Examples
- Composite user comment join
- Driver code
- Mapper code
- Reducer and combiner
- Composite user comment join
- Pattern Description
- Cartesian Product
- Pattern Description
- Intent
- Motivation
- Applicability
- Structure
- Consequences
- Resemblances
- Performance Analysis
- Cartesian Product Examples
- Comment Comparison
- Input format code
- Driver code
- Record reader code
- Mapper code
- Comment Comparison
- Pattern Description
- 6. Metapatterns
- Job Chaining
- With the Driver
- Job Chaining Examples
- Basic job chaining
- Job one mapper
- Job one reducer
- Job two mapper
- Driver code
- Parallel job chaining
- Mapper code
- Reducer code
- Driver code
- Basic job chaining
- With Shell Scripting
- Bash example
- Bash script
- Sample run
- Bash example
- With JobControl
- Job control example
- Main method
- Helper methods
- Job control example
- Chain Folding
- The ChainMapper and ChainReducer Approach
- Chain Folding Example
- Bin users by reputation
- Parsing mapper code
- Replicated join mapper code
- Reducer code
- Binning mapper code
- Driver code
- Bin users by reputation
- Job Merging
- Job Merging Examples
- Anonymous comments and distinct users
- TaggedText WritableComparable
- Merged mapper code
- Merged reducer code
- Driver code
- Anonymous comments and distinct users
- Job Merging Examples
- Job Chaining
- 7. Input and Output Patterns
- Customizing Input and Output in Hadoop
- InputFormat
- RecordReader
- OutputFormat
- RecordWriter
- Generating Data
- Pattern Description
- Intent
- Motivation
- Structure
- Consequences
- Resemblances
- Performance analysis
- Generating Data Examples
- Generating random StackOverflow comments
- Driver code
- InputSplit code
- InputFormat code
- RecordReader code
- Generating random StackOverflow comments
- Pattern Description
- External Source Output
- Pattern Description
- Intent
- Motivation
- Structure
- Consequences
- Performance analysis
- External Source Output Example
- Writing to Redis instances
- OutputFormat code
- RecordWriter code
- Mapper Code
- Driver Code
- Writing to Redis instances
- Pattern Description
- External Source Input
- Pattern Description
- Intent
- Motivation
- Structure
- Consequences
- Performance analysis
- External Source Input Example
- Reading from Redis Instances
- InputSplit code
- InputFormat code
- RecordReader code
- Driver code
- Reading from Redis Instances
- Pattern Description
- Partition Pruning
- Pattern Description
- Intent
- Motivation
- Structure
- Consequences
- Resemblances
- Performance analysis
- Partition Pruning Examples
- Partitioning by last access date to Redis instances
- Custom WritableComparable code
- OutputFormat code
- RecordWriter code
- Mapper code
- Driver code
- Querying for user reputation by last access date
- InputSplit code
- InputFormat code
- RecordReader code
- Driver code
- Partitioning by last access date to Redis instances
- Pattern Description
- Customizing Input and Output in Hadoop
- 8. Final Thoughts and the Future of Design Patterns
- Trends in the Nature of Data
- Images, Audio, and Video
- Streaming Data
- The Effects of YARN
- Patterns as a Library or Component
- How You Can Help
- Trends in the Nature of Data
- A. Bloom Filters
- Overview
- Use Cases
- Representing a Data Set
- Reduce Queries to External Database
- Google BigTable
- Downsides
- Tweaking Your Bloom Filter
- Index
- About the Authors
- Colophon
- Copyright