Getting Started with Impala. Interactive SQL for Apache Hadoop - Helion
ISBN: 978-14-919-0572-2
stron: 152, Format: ebook
Data wydania: 2014-09-25
Księgarnia: Helion
Cena książki: 92,65 zł (poprzednio: 107,73 zł)
Oszczędzasz: 14% (-15,08 zł)
Learn how to write, tune, and port SQL queries and other statements for a Big Data environment, using Impala—the massively parallel processing SQL query engine for Apache Hadoop. The best practices in this practical guide help you design database schemas that not only interoperate with other Hadoop components, and are convenient for administers to manage and monitor, but also accommodate future expansion in data size and evolution of software capabilities.
Written by John Russell, documentation lead for the Cloudera Impala project, this book gets you working with the most recent Impala releases quickly. Ideal for database developers and business analysts, the latest revision covers analytics functions, complex types, incremental statistics, subqueries, and submission to the Apache incubator.
Getting Started with Impala includes advice from Cloudera’s development team, as well as insights from its consulting engagements with customers.
- Learn how Impala integrates with a wide range of Hadoop components
- Attain high performance and scalability for huge data sets on production clusters
- Explore common developer tasks, such as porting code to Impala and optimizing performance
- Use tutorials for working with billion-row tables, date- and time-based values, and other techniques
- Learn how to transition from rigid schemas to a flexible model that evolves as needs change
- Take a deep dive into joins and the roles of statistics
Osoby które kupowały "Getting Started with Impala. Interactive SQL for Apache Hadoop", wybierały także:
- Apache 2. Leksykon kieszonkowy 24,90 zł, (12,45 zł -50%)
- Apache Kafka. Kurs video. Przetwarzanie danych w czasie rzeczywistym 87,41 zł, (48,95 zł -44%)
- Apache. Receptury. Wydanie II 49,00 zł, (36,75 zł -25%)
- MongoDB for Jobseekers 84,60 zł, (71,91 zł -15%)
- Apache Sqoop Cookbook 54,99 zł, (46,74 zł -15%)
Spis treści
Getting Started with Impala. Interactive SQL for Apache Hadoop eBook -- spis treści
- Getting Started with Impala
- Introduction
- Who Is This Book For?
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
- 1. Why Impala?
- Impalas Place in the Big Data Ecosystem
- Flexibility for Your Big Data Workflow
- High-Performance Analytics
- Exploratory Business Intelligence
- 2. Getting Up and Running with Impala
- Installation
- Connecting to Impala
- Your First Impala Queries
- 3. Impala for the Database Developer
- The SQL Language
- Standard SQL
- Limited DML
- No Transactions
- Numbers
- Recent Additions
- Big Data Considerations
- Billions and Billions of Rows
- HDFS Block Size
- Parquet Files: The Biggest Blocks of All
- How Impala Is Like a Data Warehouse
- Physical and Logical Data Layouts
- The HDFS Storage Model
- Distributed Queries
- Normalized and Denormalized Data
- File Formats
- Text File Format
- Parquet File Format
- Getting File Format Information
- Switching File Formats
- Aggregation
- The SQL Language
- 4. Common Developer Tasks for Impala
- Getting Data into an Impala Table
- INSERT Statement
- LOAD DATA Statement
- External Tables
- Figuring Out Where Impala Data Resides
- Manually Loading Data Files into HDFS
- Hive
- Sqoop
- Kite
- Porting SQL Code to Impala
- Using Impala from a JDBC or ODBC Application
- JDBC
- ODBC
- Using Impala with a Scripting Language
- Running Impala SQL Statements from Scripts
- Variable Substitution
- Saving Query Results
- The impyla Package for Python Scripting
- Optimizing Impala Performance
- Optimizing Query Performance
- Optimizing Memory Usage
- Working with Partitioned Tables
- Finding the Ideal Granularity
- Inserting into Partitioned Tables
- Adding and Loading New Partitions
- Writing User-Defined Functions
- Collaborating with Your Administrators
- Designing for Security
- Understanding Resource Management
- Helping to Plan for Performance (Stats, HDFS Caching)
- Understanding Cluster Topology
- Always Close Your Queries
- Getting Data into an Impala Table
- 5. Tutorials and Deep Dives
- Tutorial: From Unix Data File to Impala Table
- Tutorial: Queries Without a Table
- Tutorial: The Journey of a Billion Rows
- Generating a Billion Rows of CSV Data
- Normalizing the Original Data
- Converting to Parquet Format
- Making a Partitioned Table
- Next Steps
- Deep Dive: Joins and the Role of Statistics
- Creating a Million-Row Table to Join With
- Loading Data and Computing Stats
- Reviewing the EXPLAIN Plan
- Trying a Real Query
- The Story So Far
- Final Join Query with 1B x 1M Rows
- Anti-Pattern: A Million Little Pieces
- Tutorial: Across the Fourth Dimension
- TIMESTAMP Data Type
- Format Strings for Dates and Times
- Working with Individual Date and Time Fields
- Date and Time Arithmetic
- Lets Solve the Y2K Problem
- More Fun with Dates
- Tutorial: Verbose and Quiet impala-shell Output
- Tutorial: When Schemas Evolve
- Numbers Versus Strings
- Dealing with Out-of-Range Integers
- Tutorial: Levels of Abstraction
- String Formatting
- Temperature Conversion
- Colophon
- Copyright