Apache Oozie. The Workflow Scheduler for Hadoop - Helion
ISBN: 978-14-493-6975-0
stron: 272, Format: ebook
Data wydania: 2015-05-12
Księgarnia: Helion
Cena książki: 126,65 zł (poprzednio: 147,27 zł)
Oszczędzasz: 14% (-20,62 zł)
Get a solid grounding in Apache Oozie, the workflow scheduler system for managing Hadoop jobs. With this hands-on guide, two experienced Hadoop practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases.
Once you set up your Oozie server, you’ll dive into techniques for writing and coordinating workflows, and learn how to write complex data pipelines. Advanced topics show you how to handle shared libraries in Oozie, as well as how to implement and manage Oozie’s security capabilities.
- Install and configure an Oozie server, and get an overview of basic concepts
- Journey through the world of writing and configuring workflows
- Learn how the Oozie coordinator schedules and executes workflows based on triggers
- Understand how Oozie manages data dependencies
- Use Oozie bundles to package several coordinator apps into a data pipeline
- Learn about security features and shared library management
- Implement custom extensions and write your own EL functions and actions
- Debug workflows and manage Oozie’s operational details
Osoby które kupowały "Apache Oozie. The Workflow Scheduler for Hadoop", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Apache Oozie. The Workflow Scheduler for Hadoop eBook -- spis treści
- Foreword
- Preface
- Contents of This Book
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
- 1. Introduction to Oozie
- Big Data Processing
- A Recurrent Problem
- A Common Solution: Oozie
- Oozies role in the Hadoop Ecosystem
- What exactly is Oozie?
- The name Oozie
- A Simple Oozie Job
- Oozie Releases
- Timeline and status of the releases
- Compatibility
- Some Oozie Usage Numbers
- Big Data Processing
- 2. Oozie Concepts
- Oozie Applications
- Oozie Workflows
- Workflow use case
- Oozie Coordinators
- Coordinator use case
- Oozie Bundles
- Bundle use case
- Oozie Workflows
- Parameters, Variables, and Functions
- Application Deployment Model
- Oozie Architecture
- Oozie Applications
- 3. Setting Up Oozie
- Oozie Deployment
- Basic Installations
- Requirements
- Build Oozie
- Install Oozie Server
- Hadoop Cluster
- Hadoop installation
- Configuring Hadoop for Oozie
- Hadoop installation
- Start and Verify the Oozie Server
- Advanced Oozie Installations
- Configuring Kerberos Security
- DB Setup
- MySQL configuration
- Oracle configuration
- Shared Library Installation
- Sharelib since version 4.1.0
- Oozie Client Installations
- 4. Oozie Workflow Actions
- Workflow
- Actions
- Action Execution Model
- Action Definition
- Action Types
- MapReduce Action
- Streaming
- Pipes
- MapReduce example
- Streaming example
- Java Action
- Java example
- Pig Action
- Pig example
- FS Action
- Filesystem example
- Sub-Workflow Action
- Hive Action
- Hive example
- DistCp Action
- DistCp Example
- Email Action
- Shell Action
- Shell example
- SSH Action
- Sqoop Action
- Sqoop example
- MapReduce Action
- Synchronous Versus Asynchronous Actions
- 5. Workflow Applications
- Outline of a Basic Workflow
- Control Nodes
- <start> and <end>
- <fork> and <join>
- <decision>
- <kill>
- <OK> and <ERROR>
- Job Configuration
- Global Configuration
- Job XML
- Inline Configuration
- Launcher Configuration
- Parameterization
- EL Variables
- EL constants and system-defined variables
- Hadoop counters
- EL Functions
- String timestamp()
- String wf:id()
- String wf:errorCode(String node)
- boolean fs:fileSize(String path)
- EL Expressions
- EL Variables
- The job.properties File
- Command-Line Option
- The config-default.xml File
- The <parameters> Section
- Configuration and Parameterization Examples
- Lifecycle of a Workflow
- Action States
- 6. Oozie Coordinator
- Coordinator Concept
- Triggering Mechanism
- Time Trigger
- Data Availability Trigger
- Coordinator Application and Job
- Coordinator Action
- Our First Coordinator Job
- Coordinator Submission
- Oozie Web Interface for Coordinator Jobs
- Coordinator Job Lifecycle
- Coordinator Action Lifecycle
- Parameterization of the Coordinator
- EL Functions for Frequency
- Day-Based Frequency
- Month-Based Frequency
- Execution Controls
- An Improved Coordinator
- 7. Data Trigger Coordinator
- Expressing Data Dependency
- Dataset
- Defining a dataset
- Timelines: coordinator versus dataset
- input-events
- output-events
- Dataset
- Example: Rollup
- Parameterization of Dataset Instances
- current(n)
- latest(n)
- Comparison of current() and latest()
- Parameter Passing to Workflow
- dataIn(eventName):
- dataOut(eventName)
- nominalTime()
- actualTime()
- dateOffset(baseTimeStamp, skipInstance, timeUnit)
- formatTime(timeStamp, formatString)
- A Complete Coordinator Application
- Expressing Data Dependency
- 8. Oozie Bundles
- Bundle Basics
- Bundle Definition
- Why Do We Need Bundles?
- Bundle Specification
- Execution Controls
- Bundle State Transitions
- Bundle Basics
- 9. Advanced Topics
- Managing Libraries in Oozie
- Origin of JARs in Oozie
- Design Challenges
- Managing Action JARs
- How to get the JARs?
- Installing sharelib
- Overriding/upgrading existing JARs
- Supporting multiple versions
- Supporting the Users JAR
- JAR Precedence in classpath
- Oozie Security
- Oozie Security Overview
- Oozie to Hadoop
- Configuring Hadoop services
- Setting up Keytab and Principal
- Configuring the Oozie server
- Oozie Client to Server
- Oozie Server Security
- Configuring the Oozie Server
- Oozie client
- Proxy user in Oozie
- Supporting Custom Credentials
- Supporting New API in MapReduce Action
- Supporting Uber JAR
- Cron Scheduling
- A Simple Cron-Based Coordinator
- Oozie Cron Specification
- Allowed values
- Special characters
- Nonstandard special characters
- Emulate Asynchronous Data Processing
- HCatalog-Based Data Dependency
- Managing Libraries in Oozie
- 10. Developer Topics
- Developing Custom EL Functions
- Requirements for a New EL Function
- Implementing a New EL Function
- Writing a new EL function
- Deploy the new EL function
- Using the new function
- Supporting Custom Action Types
- Creating a Custom Synchronous Action
- Writing an ActionExecutor
- Writing the XML schema
- Deploying the new action type
- Using the new action type
- Creating a Custom Synchronous Action
- Overriding an Asynchronous Action Type
- Implementing the New ActionMain Class
- Testing the New Main Class
- Creating a New Asynchronous Action
- Writing an Asynchronous Action Executor
- Writing the ActionMain Class
- Writing Actions Schema
- Deploying the New Action Type
- Using the New Action Type
- Developing Custom EL Functions
- 11. Oozie Operations
- Oozie CLI Tool
- CLI Subcommands
- Useful CLI Commands
- The validate subcommand
- The job subcommand
- The jobs subcommand
- More subcommands
- Oozie REST API
- Oozie Java Client
- The oozie-site.xml File
- The Oozie Purge Service
- Job Monitoring
- JMS-Based Monitoring
- Installation and configuration
- Consuming JMS messages
- JMS-Based Monitoring
- Oozie Instrumentation and Metrics
- Reprocessing
- Workflow Reprocessing
- Coordinator Reprocessing
- Bundle Reprocessing
- Server Tuning
- JVM Tuning
- Service Settings
- The CallableQueueService
- The RecoveryService
- Oozie High Availability
- Debugging in Oozie
- Oozie Logs
- Developing and Testing Oozie Applications
- Application Deployment Tips
- Common Errors and Debugging
- MiniOozie and LocalOozie
- The Competition
- Oozie CLI Tool
- Index