Architecting HBase Applications. A Guidebook for Successful Development and Design - Helion

ebook

Autor: Jean-Marc Spaggiari, Kevin O'Dell
ISBN: 978-14-919-1610-0
stron: 252, Format: ebook
Data wydania: 2016-07-18
Księgarnia: Helion

Cena książki: 118,15 zł (poprzednio: 137,38 zł)
Oszczędzasz: 14% (-19,23 zł)

Osoby, które kupiły tę książkę, wybierały także »

Tagi: Inne - Programowanie | Programowanie w chmurze

HBase is a remarkable tool for indexing mass volumes of data, but getting started with this distributed database and its ecosystem can be daunting. With this hands-on guide, you’ll learn how to architect, design, and deploy your own HBase applications by examining real-world solutions. Along with HBase principles and cluster deployment guidelines, this book includes in-depth case studies that demonstrate how large companies solved specific use cases with HBase.

Authors Jean-Marc Spaggiari and Kevin O’Dell also provide draft solutions and code examples to help you implement your own versions of those use cases, from master data management (MDM) and document storage to near real-time event processing. You’ll also learn troubleshooting techniques to help you avoid common deployment mistakes.

Learn exactly what HBase does, what its ecosystem includes, and how to set up your environment
Explore how real-world HBase instances were deployed and put into production
Examine documented use cases for tracking healthcare claims, digital advertising, data management, and product quality
Understand how HBase works with tools and techniques such as Spark, Kafka, MapReduce, and the Java API
Learn how to identify the causes and understand the consequences of the most common HBase issues

Osoby które kupowały "Architecting HBase Applications. A Guidebook for Successful Development and Design", wybierały także:

Flutter i Dart. Kurs video. Nowoczesne aplikacje mobilne i webowe 166,25 zł, (39,90 zł -76%)
F# 4.0 dla zaawansowanych. Wydanie IV 96,45 zł, (29,90 zł -69%)
Systemy reaktywne. Wzorce projektowe i ich stosowanie 65,31 zł, (20,90 zł -68%)
Superinteligencja. Scenariusze, strategie, zagro 69,00 zł, (24,15 zł -65%)
Getting Things Programmed. Droga do efektywności 34,89 zł, (12,21 zł -65%)

Spis treści

Architecting HBase Applications. A Guidebook for Successful Development and Design eBook -- spis treści

Foreword
Preface
- Who Should Read This Book?
- How This Book Is Organized
- Additional Resources
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
  - From Kevin
  - From Jean-Marc
I. Introduction to HBase
1. What Is HBase?
- Column-Oriented Versus Row-Oriented
- Implementation and Use Cases
2. HBase Principles
- Table Format
  - Table Layout
  - Table Storage
    - Regions
    - Column family
    - Stores
    - HFiles
    - Blocks
    - Cells
- Internal Table Operations
  - Compaction
    - Minor compaction
    - Major compaction
  - Splits (Auto-Sharding)
  - Balancing
- Dependencies
- HBase Roles
  - Master Server
  - RegionServer
  - Thrift Server
  - REST Server
3. HBase Ecosystem
- Monitoring Tools
  - Cloudera Manager
  - Apache Ambari
  - Hannibal
- SQL
  - Apache Phoenix
  - Apache Trafodion
  - Splice Machine
  - Honorable Mentions (Kylin, Themis, Tephra, Hive, and Impala)
- Frameworks
  - OpenTSDB
  - Kite
  - HappyBase
  - AsyncHBase
4. HBase Sizing and Tuning Overview
- Hardware
- Storage
- Networking
- OS Tuning
- Hadoop Tuning
- HBase Tuning
- Different Workload Tuning
5. Environment Setup
- System Requirements
  - Operating System
  - Virtual Machine
    - VM modes
    - Hadoop distribution
  - Resources
    - Memory
    - Disk space
  - Java
- HBase Standalone Installation
- HBase in a VM
- Local Versus VM
  - Local Mode
    - Pros
    - Cons
  - Virtual Linux Environment
    - Pros
    - Cons
  - QuickStart VM (or Equivalent)
    - Pros
    - Cons
- Troubleshooting
  - IP/Name Configuration
  - Access to the /tmp Folder
  - Environment Variables
  - Available Memory
- First Steps
  - Basic Operations
    - help
    - create
    - list
  - Import Code Examples
    - Download from command line
    - Build from command line
    - Download and build using Eclipse
  - Testing the Examples
    - From command line
    - From Eclipse
- Pseudodistributed and Fully Distributed
II. Use Cases
6. Use Case: HBase as a System of Record
- Ingest/Pre-Processing
- Processing/Serving
- User Experience
7. Implementation of an Underlying Storage Engine
- Table Design
  - Table Schema
    - Hashing keys
    - Column qualifier
  - Table Parameters
    - Compression
    - Data block encoding
    - Bloom filter
    - Presplitting
  - Implementation
- Data conversion
  - Generate Test Data
  - Create Avro Schema
  - Implement MapReduce Transformation
- HFile Validation
- Bulk Loading
- Data Validation
  - Table Size
    - Counting from the shell
    - Counting from MapReduce
  - File Content
    - Using the shell
    - Using Java
- Data Indexing
- Data Retrieval
- Going Further
8. Use Case: Near Real-Time Event Processing
- Ingest/Pre-Processing
- Near Real-Time Event Processing
- Processing/Serving
9. Implementation of Near Real-Time Event Processing
- Application Flow
  - Kafka
  - Flume
  - HBase
  - Lily
  - Solr
- Implementation
  - Data Generation
  - Kafka
  - Flume
    - Flume Kafka source
    - Flume Kafka channel
    - Flume HBase sink
    - Interceptor
      - Conversion
      - Lookup
  - Serializer
  - HBase
    - Table design
    - Table parameters
    - Java implementation
  - Lily
  - Solr
  - Testing
- Going Further
10. Use Case: HBase as a Master Data Management Tool
- Ingest
- Processing
11. Implementation of HBase as a Master Data Management Tool
- MapReduce Versus Spark
- Get Spark Interacting with HBase
  - Run Spark over an HBase Table
  - Calling HBase from Spark
- Implementing Spark with HBase
  - Spark and HBase: Puts
  - Spark on HBase: Bulk Load
  - Spark Over HBase
- Going Further
12. Use Case: Document Store
- Serving
- Ingest
- Clean Up
13. Implementation of Document Store
- MOBs
  - Storage
  - Usage
  - Too Big
- Consistency
- Going Further
III. Troubleshooting
14. Too Many Regions
- Consequences
- Causes
  - Misconfiguration
  - Misoperation
    - Over-splitting
    - Improper presplitting
- Solution
  - Before 0.98
    - Offline merges
    - Using HBase command
    - Using the Java API
  - Starting with 0.98
    - Using HBase shell
    - Using the Java API
- Prevention
  - Regions Size
  - Key and Table Design
15. Too Many Column Families
- Consequences
  - Memory
  - Compactions
  - Split
- Causes, Solution, and Prevention
  - Delete a Column Family
  - Merge a Column Family
  - Separate a Column Family into a New Table
16. Hotspotting
- Consequences
- Causes
  - Monotonically Incrementing Keys
  - Poorly Distributed Keys
  - Small Reference Tables
  - Applications Issues
  - Meta Region Hotspotting
- Prevention and Solution
17. Timeouts and Garbage Collection
- Consequences
- Causes
  - Storage Failure
  - Power-Saving Features
  - Network Failure
- Solutions
- Prevention
  - Reduce Heap Size
  - Off-Heap BlockCache
  - Using the G1GC Algorithm
    - Must-use parameters
    - Additional HBase settings while exceeding 100 GB heaps
    - Other interesting parameters
  - Configure Swappiness to 0 or 1
  - Disable Environment-Friendly Features
  - Hardware Duplication
18. HBCK and Inconsistencies
- HBase Filesystem Layout
- Reading META
- Reading HBase on HDFS
- General HBCK Overview
- Using HBCK
Index