Pig Design Patterns. Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig - Helion
ebook
Autor: Pradeep PasupuletiTytuł oryginału: Pig Design Patterns. Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig.
ISBN: 9781783285563
stron: 310, Format: ebook
Data wydania: 2014-04-17
Księgarnia: Helion
Cena książki: 152,10 zł (poprzednio: 169,00 zł)
Oszczędzasz: 10% (-16,90 zł)
Osoby które kupowały "Pig Design Patterns. Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Pig Design Patterns. Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig eBook -- spis treści
- Pig Design Patterns
- Table of Contents
- Pig Design Patterns
- Credits
- Foreword
- About the Author
- Acknowledgments
- About the Reviewers
- www.PacktPub.com
- Support files, eBooks, discount offers and more
- Why Subscribe?
- Free Access for Packt account holders
- Support files, eBooks, discount offers and more
- Preface
- What this book covers
- Motivation for this book
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Third-party libraries
- Datasets
- Errata
- Piracy
- Questions
- Downloading the example code
- What this book covers
- 1. Setting the Context for Design Patterns in Pig
- Understanding design patterns
- The scope of design patterns in Pig
- Hadoop demystified a quick reckoner
- The enterprise context
- Common challenges of distributed systems
- The advent of Hadoop
- Hadoop under the covers
- Understanding the Hadoop Distributed File System
- HDFS design goals
- Working of HDFS
- Understanding MapReduce
- Understanding how MapReduce works
- The MapReduce internals
- Pig a quick intro
- Understanding the rationale of Pig
- Understanding the relevance of Pig in the enterprise
- Working of Pig an overview
- Firing up Pig
- The use case
- Code listing
- The dataset
- Understanding Pig through the code
- Pigs extensibility
- Operators used in code
- The EXPLAIN operator
- Understanding Pig's data model
- Primitive types
- Complex types
- The relevance of schemas
- Summary
- 2. Data Ingest and Egress Patterns
- The context of data ingest and egress
- Types of data in the enterprise
- Ingest and egress patterns for multistructured data
- Considerations for log ingestion
- The Apache log ingestion pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Code for the CommonLogLoader class
- Code for the CombinedLogLoader class
- Results
- Additional information
- The Custom log ingestion pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The image ingress and egress pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- The image Ingress Implementation
- The image egress implementation
- Code snippets
- The image ingress
- Pig script
- Image to a sequence UDF snippet
- The image egress
- Pig script
- Sequence to an image UDF
- The image ingress
- Results
- Additional information
- Considerations for log ingestion
- The ingress and egress patterns for the NoSQL data
- MongoDB ingress and egress patterns
- Background
- Motivation
- Use cases
- Pattern implementation
- The ingress implementation
- The egress implementation
- Code snippets
- The ingress code
- The egress code
- Results
- Additional information
- The HBase ingress and egress pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- The ingress implementation
- The egress implementation
- Code snippets
- The ingress code
- The egress code
- Results
- Additional information
- MongoDB ingress and egress patterns
- The ingress and egress patterns for structured data
- The Hive ingress and egress patterns
- Background
- Motivation
- Use cases
- Pattern implementation
- The ingress implementation
- The egress implementation
- Code snippets
- The ingress Code
- Importing data using RCFile
- Importing data using HCatalog
- The egress code
- The ingress Code
- Results
- Additional information
- The Hive ingress and egress patterns
- The ingress and egress patterns for semi-structured data
- The mainframe ingestion pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- XML ingest and egress patterns
- Background
- Motivation
- Motivation for ingesting raw XML
- Motivation for ingesting binary XML
- Motivation for egression of XML
- Use cases
- Pattern implementation
- The implementation of the XML raw ingestion
- The implementation of the XML binary ingestion
- Code snippets
- The XML raw ingestion code
- The XML binary ingestion code
- The XML egress code
- Pig script
- The XML storage
- Results
- Additional information
- The mainframe ingestion pattern
- JSON ingress and egress patterns
- Background
- Motivation
- Use cases
- Pattern implementation
- The ingress implementation
- The egress implementation
- Code snippets
- The ingress code
- The code for simple JSON
- The code for nested JSON
- The egress code
- The ingress code
- Results
- Additional information
- Background
- Summary
- 3. Data Profiling Patterns
- Data profiling for Big Data
- Big Data profiling dimensions
- Sampling considerations for profiling Big Data
- Sampling support in Pig
- Rationale for using Pig in data profiling
- The data type inference pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Pig script
- Java UDF
- Results
- Additional information
- The basic statistical profiling pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Pig script
- Macro
- Results
- Additional information
- The pattern-matching pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Pig script
- Macro
- Results
- Additional information
- The string profiling pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Pig script
- Macro
- Results
- Additional information
- The unstructured text profiling pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Pig script
- Java UDF for stemming
- Java UDF for generating TF-IDF
- Results
- Additional information
- Summary
- Data profiling for Big Data
- 4. Data Validation and Cleansing Patterns
- Data validation and cleansing for Big Data
- Choosing Pig for validation and cleansing
- The constraint validation and cleansing design pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The regex validation and cleansing design pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The corrupt data validation and cleansing design pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The unstructured text data validation and cleansing design pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- Summary
- 5. Data Transformation Patterns
- Data transformation processes
- The structured-to-hierarchical transformation pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The data normalization pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The data integration pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The aggregation pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The data generalization pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- Summary
- 6. Understanding Data Reduction Patterns
- Data reduction a quick introduction
- Data reduction considerations for Big Data
- Dimensionality reduction the Principal Component Analysis design pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Limitations of PCA implementation
- Code snippets
- Results
- Additional information
- Numerosity reduction the histogram design pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- Numerosity reduction sampling design pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- Numerosity reduction clustering design pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- Summary
- 7. Advanced Patterns and Future Work
- The clustering pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The topic discovery pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The natural language processing pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- The classification pattern
- Background
- Motivation
- Use cases
- Pattern implementation
- Code snippets
- Results
- Additional information
- Future trends
- Emergence of data-driven patterns
- The emergence of solution-driven patterns
- Patterns addressing programmability constraints
- Summary
- The clustering pattern
- Index