reklama - zainteresowany?

Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease - Helion

Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease
ebook
Autor: Mar
Tytuł oryginału: Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease
ISBN: 9781847199553
stron: 492, Format: ebook
Data wydania: 2010-04-04
Księgarnia: Helion

Cena książki: 159,00 zł

Dodaj do koszyka Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease

Pentaho Data Integration (a.k.a. Kettle) is a full-featured open source ETL (Extract, Transform, and Load) solution. Although PDI is a feature-rich tool, effectively capturing, manipulating, cleansing, transferring, and loading data can get complicated.This book is full of practical examples that will help you to take advantage of Pentaho Data Integration's graphical, drag-and-drop design environment. You will quickly get started with Pentaho Data Integration by following the step-by-step guidance in this book. The useful tips in this book will encourage you to exploit powerful features of Pentaho Data Integration and perform ETL operations with ease.Starting with the installation of the PDI software, this book will teach you all the key PDI concepts. Each chapter introduces new features, allowing you to gradually get involved with the tool. First, you will learn to work with plain files, and to do all kinds of data manipulation. Then, the book gives you a primer on databases and teaches you how to work with databases inside PDI. Not only that, you'll be given an introduction to data warehouse concepts and you will learn to load data in a data warehouse. After that, you will learn to implement simple and complex processes.Once you've learned all the basics, you will build a simple datamart that will serve to reinforce all the concepts learned through the book.

Dodaj do koszyka Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease

 

Osoby które kupowały "Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease", wybierały także:

  • Windows Media Center. Domowe centrum rozrywki
  • Ruby on Rails. Ćwiczenia
  • Przywództwo w Å›wiecie VUCA. Jak być skutecznym liderem w niepewnym Å›rodowisku
  • Scrum. O zwinnym zarzÄ…dzaniu projektami. Wydanie II rozszerzone
  • Od hierarchii do turkusu, czyli jak zarzÄ…dzać w XXI wieku

Dodaj do koszyka Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease

Spis treści

Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease eBook -- spis treści

  • Pentaho 3.2 Data Integration Beginners Guide
    • Table of Contents
    • Pentaho 3.2 Data Integration Beginner's Guide
    • Credits
    • Foreword
    • The Kettle Project
    • About the Author
    • About the Reviewers
    • Preface
      • How to read this book
      • What this book covers
      • What you need for this book
      • Who this book is for
      • Conventions
      • Reader feedback
      • Customer support
        • Errata
        • Piracy
        • Questions
    • 1. Getting Started with Pentaho Data Integration
      • Pentaho Data Integration and Pentaho BI Suite
        • Exploring the Pentaho Demo
      • Pentaho Data Integration
        • Using PDI in real world scenarios
          • Loading datawarehouses or datamarts
            • Integrating data
            • Data cleansing
            • Migrating information
            • Exporting data
            • Integrating PDI using Pentaho BI
            • Pop quiz PDI data sources
      • Installing PDI
      • Time for action installing PDI
        • What just happened?
        • Pop quiz PDI prerequisites
      • Launching the PDI graphical designer: Spoon
      • Time for action starting and customizing Spoon
        • What just happened?
        • Spoon
          • Setting preferences in the Options window
          • Storing transformations and jobs in a repository
        • Creating your first transformation
      • Time for action creating a hello world transformation
        • What just happened?
          • Directing the Kettle engine with transformations
          • Exploring the Spoon interface
            • Viewing the transformation structure
          • Running and previewing the transformation
      • Time for action running and previewing the hello_world transformation
        • What just happened?
          • Previewing the results in the Execution Results window
        • Pop quiz PDI basics
      • Installing MySQL
      • Time for action installing MySQL on Windows
        • What just happened?
      • Time for action installing MySQL on Ubuntu
        • What just happened?
      • Summary
    • 2. Getting Started with Transformations
      • Reading data from files
      • Time for action reading results of football matches from files
        • What just happened?
        • Input files
          • Input steps
        • Reading several files at once
      • Time for action reading all your files at a time using a single Text file input step
        • What just happened?
      • Time for action reading all your files at a time using a single Text file input step and regular expressions
        • What just happened?
          • Regular expressions
          • Troubleshooting reading files
        • Grids
        • Have a go hero explore your own files
      • Sending data to files
      • Time for action sending the results of matches to a plain file
        • What just happened?
        • Output files
          • Output steps
        • Some data definitions
          • Rowset
          • Streams
        • The Select values step
        • Have a go hero extending your transformations by writing output files
      • Getting system information
      • Time for action updating a file with news about examinations
        • What just happened?
        • Getting information by using Get System Info step
        • Data types
          • Date fields
          • Numeric fields
        • Running transformations from a terminal window
      • Time for action running the examination transformation from a terminal window
        • What just happened?
        • Have a go hero using different date formats
        • Go for a hero formatting 99.55
        • Pop quizformatting data
      • XML files
      • Time for action getting data from an XML file with information about countries
        • What just happened?
        • What is XML
          • PDI transformation files
        • Getting data from XML files
          • XPath
          • Configuring the Get data from XML step
        • Kettle variables
          • How and when you can use variables
        • Have a go hero exploring XML files
        • Have a go hero enhancing the output countries file
        • Have a go hero documenting your work
      • Summary
    • 3. Basic Data Manipulation
      • Basic calculations
      • Time for action reviewing examinations by using the Calculator step
        • What just happened?
        • Adding or modifying fields by using different PDI steps
          • The Calculator step
          • The Formula step
      • Time for action reviewing examinations by using the Formula step
        • What just happened?
        • Have a go hero listing students and their examinations results
        • Pop quiz concatenating strings
      • Calculations on groups of rows
      • Time for action calculating World Cup statistics by grouping data
        • What just happened?
        • Group by step
        • Have a go hero calculating statistics for the examinations
        • Have a go hero listing the languages spoken by country
      • Filtering
      • Time for action counting frequent words by filtering
        • What just happened?
        • Filtering rows using the Filter rows step
        • Have a go hero playing with filters
        • Have a go hero counting words and discarding those that are commonly used
      • Looking up data
      • Time for action finding out which language people speak
        • What just happened?
        • The Stream lookup step
        • Have a go hero counting words more precisely
      • Summary
    • 4. Controlling the Flow of Data
      • Splitting streams
      • Time for action browsing new PDI features by copyinga dataset
        • What just happened?
        • Copying rows
        • Have a go hero recalculating statistics
        • Distributing rows
      • Time for action assigning tasks by distributing
        • What just happened?
        • Pop quiz data movement (copying and distributing)
      • Splitting the stream based on conditions
      • Time for action assigning tasks by filtering priorities with the Filter rows step
        • What just happened?
        • PDI steps for splitting the stream based on conditions
      • Time for action assigning tasks by filtering priorities with the Switch/ Case step
        • What just happened?
        • Have a go hero listing languages and countries
        • Pop quiz splitting a stream
      • Merging streams
      • Time for action gathering progress and merging all together
        • What just happened?
        • PDI options for merging streams
      • Time for action giving priority to Bouchard by using Append Stream
        • What just happened?
        • Have a go hero sorting and merging all tasks
        • Have a go hero trying to find missing countries
      • Summary
    • 5. Transforming Your Data with JavaScript Code and the JavaScript Step
      • Doing simple tasks with the JavaScript step
      • Time for action calculating scores with JavaScript
        • What just happened?
        • Using the JavaScript language in PDI
        • Inserting JavaScript code using the Modified Java Script Value step
          • Adding fields
          • Modifying fields
          • Turning on the compatibility switch
        • Have a go hero adding and modifying fields to the contest data
        • Testing your code
      • Time for action testing the calculation of averages
        • What just happened?
          • Testing the script using the Test script button
        • Have a go hero testing the new calculation of the average
      • Enriching the code
      • Time for action calculating flexible scores by using variables
        • What just happened?
        • Using named parameters
        • Using the special Start, Main, and End scripts
        • Using transformation predefined constants
        • Pop quiz finding the 7 errors
        • Have a go hero keeping the top 10 performances
        • Have a go hero calculating scores with Java code
      • Reading and parsing unstructured files
      • Time for action changing a list of house descriptions with JavaScript
        • What just happened?
        • Looking at previous rows
        • Have a go hero enhancing the houses file
        • Have a go hero fill gaps in the contest file
      • Avoiding coding by using purpose-built steps
        • Have a go hero creating alternative solutions
      • Summary
    • 6. Transforming the Row Set
      • Converting rows to columns
      • Time for action enhancing a films file by converting rows to columns
        • What just happened?
        • Converting row data to column data by using the Row denormalizer step
        • Have a go hero houses revisited
        • Aggregating data with a Row denormalizer step
      • Time for action calculating total scores by performances by country
        • What just happened?
        • Using Row denormalizer for aggregating data
        • Have a go hero calculating scores by skill by continent
      • Normalizing data
      • Time for action enhancing the matches file by normalizing the dataset
        • What just happened?
        • Modifying the dataset with a Row Normalizer step
        • Summarizing the PDI steps that operate on sets of rows
        • Have a go hero verifying the benefits of normalization
        • Have a go hero normalizing the Films file
        • Have a go hero calculating scores by judge
      • Generating a custom time dimension dataset by using Kettle variables
      • Time for action creating the time dimension dataset
        • What just happened?
        • Getting variables
      • Time for action getting variables for setting the default starting date
        • What just happened?
          • Using the Get Variables step
        • Have a go hero enhancing the time dimension
        • Pop quiz using Kettle variables inside transformations
      • Summary
    • 7. Validating Data and Handling Errors
      • Capturing errors
      • Time for action capturing errors while calculating the ageof a film
        • What just happened?
        • Using PDI error handling functionality
        • Aborting a transformation
      • Time for action aborting when there are too many errors
        • What just happened?
          • Aborting a transformation using the Abort step
        • Fixing captured errors
      • Time for action treating errors that may appear
        • What just happened?
          • Treating rows coming to the error stream
        • Pop quiz PDI error handling
        • Have a go hero capturing errors while seeing who wins
      • Avoiding unexpected errors by validating data
      • Time for action validating genres with a Regex Evaluation step
        • What just happened?
        • Validating data
      • Time for action checking films file with the Data Validator
        • What just happened?
          • Defining simple validation rules using the Data Validator
        • Have a go hero validating the football matches file
        • Cleansing data
        • Have a go hero cleansing films data
      • Summary
    • 8. Working with Databases
      • Introducing the Steel Wheels sample database
        • Connecting to the Steel Wheels database
      • Time for action creating a connection with the Steel Wheels database
        • What just happened?
          • Connecting with Relational Database Management Systems
        • Pop quiz defining database connections
        • Have a go hero connecting to your own databases
        • Exploring the Steel Wheels database
      • Time for action exploring the sample database
        • What just happened?
          • A brief word about SQL
          • Exploring any configured database with the PDI Database explorer
        • Have a go hero exploring the sample data in depth
        • Have a go hero exploring your own databases
      • Querying a database
      • Time for action getting data about shipped orders
        • What just happened?
        • Getting data from the database with the Table input step
        • Using the SELECT statement for generating a new dataset
          • Making flexible queries by using parameters
      • Time for action getting orders in a range of dates by using parameters
        • What just happened?
          • Adding parameters to your queries
          • Making flexible queries by using Kettle variables
      • Time for action getting orders in a range of dates by using variables
        • What just happened?
          • Using Kettle variables in your queries
        • Pop quiz database datatypes versus PDI datatypes
        • Have a go hero querying the sample data
      • Sending data to a database
      • Time for action loading a table with a list of manufacturers
        • What just happened?
        • Inserting new data into a database table with the Table output step
        • Inserting or updating data by using other PDI steps
      • Time for action inserting new products or updating existent ones
        • What just happened?
      • Time for action testing the update of existing products
        • What just happened?
          • Inserting or updating data with the Insert/Update step
        • Have a go hero populating a films database
        • Have a go hero creating the time dimension
        • Have a go hero populating the products table
        • Pop quiz Insert/Update step versus Table Output/Update steps
        • Pop quiz filtering the first 10 rows
      • Eliminating data from a database
      • Time for action deleting data about discontinued items
        • What just happened?
        • Deleting records of a database table with the Delete step
        • Have a go hero deleting old orders
      • Summary
    • 9. Performing Advanced Operations with Databases
      • Preparing the environment
      • Time for action populating the Jigsaw database
        • What just happened?
        • Exploring the Jigsaw database model
      • Looking up data in a database
        • Doing simple lookups
      • Time for action using a Database lookup step to create a list of products to buy
        • What just happened?
          • Looking up values in a database with the Database lookup step
        • Have a go hero preparing the delivery of the products
        • Have a go hero refining the transformation
        • Doing complex lookups
      • Time for action using a Database join step to create a list of suggested products to buy
        • What just happened?
          • Joining data from the database to the stream data by using a Database join step
        • Have a go hero rebuilding the list of customers
      • Introducing dimensional modeling
      • Loading dimensions with data
      • Time for action loading a region dimension with a Combination lookup/update step
        • What just happened?
      • Time for action testing the transformation that loads the region dimension
        • What just happened?
        • Describing data with dimensions
          • Loading Type I SCD with a Combination lookup/update step
        • Have a go hero adding regions to the Region Dimension
        • Have a go hero loading the manufacturers dimension
        • Have a go hero loading a mini-dimension
        • Keeping a history of changes
      • Time for action keeping a history of product changes with the Dimension lookup/update step
        • What just happened?
      • Time for action testing the transformation that keeps a historyof product changes
        • What just happened?
          • Keeping an entire history of data with a Type II slowly changing dimension
        • Loading Type II SCDs with the Dimension lookup/update step
        • Have a go hero keeping a history just for the theme of a product
        • Have a go hero loading a Type II SCD dimension
        • Pop quiz loading slowly changing dimensions
        • Pop quiz loading type III slowly changing dimensions
      • Summary
    • 10. Creating Basic Task Flows
      • Introducing PDI jobs
      • Time for action creating a simple hello world job
        • What just happened?
        • Executing processes with PDI jobs
          • Using Spoon to design and run jobs
        • Using the transformation job entry
        • Pop quiz defining PDI jobs
        • Have a go hero loading the dimension tables
      • Receiving arguments and parameters in a job
      • Time for action customizing the hello world file with arguments and parameters
        • What just happened?
        • Using named parameters in jobs
        • Have a go hero backing up your work
      • Running jobs from a terminal window
      • Time for action executing the hello world job from a terminal window
        • What just happened?
        • Have a go hero experiencing Kitchen
      • Using named parameters and command-line arguments in transformations
      • Time for action calling the hello world transformation with fixed arguments and parameters
        • What just happened?
        • Have a go hero saying hello again and again
        • Have a go hero loading the time dimension from a job
      • Deciding between the use of a command-line argument and a named parameter
        • Have a go hero analysing the use of arguments and named parameters
      • Running job entries under conditions
      • Time for action sending a sales report and warning the administrator if something is wrong
        • What just happened?
        • Changing the flow of execution on the basis of conditions
        • Have a go hero refining the sales report
        • Creating and using a file results list
        • Have a go hero sharing your work
      • Summary
    • 11. Creating Advanced Transformations and Jobs
      • Enhancing your processes with the use of variables
      • Time for action updating a file with news about examinations by setting a variable with the name of the file
        • What just happened?
        • Setting variables inside a transformation
        • Have a go hero enhancing the examination tutorial even more
        • Have a go hero enhancing the jigsaw database update process
        • Have a go hero executing the proper jigsaw database update process
      • Enhancing the design of your processes
      • Time for action generating files with top scores
        • What just happened?
        • Pop quiz using the Add Sequence step
        • Reusing part of your transformations
      • Time for action calculating the top scores with a subtransformation
        • What just happened?
          • Creating and using subtransformations
        • Have a go hero refining the subtransformation
        • Have a go hero counting words more precisely (second version)
        • Creating a job as a process flow
      • Time for action splitting the generation of top scores by copying and getting rows
        • What just happened?
          • Transferring data between transformations by using the copy /get rows mechanism
        • Have a go hero modifying the flow
        • Nesting jobs
      • Time for action generating the files with top scores by nesting jobs
        • What just happened?
          • Running a job inside another job with a job entry
          • Understanding the scope of variables
        • Pop quiz deciding the scope of variables
      • Iterating jobs and transformations
      • Time for action generating custom files by executing a transformation for every input row
        • What just happened?
        • Executing for each row
        • Have a go hero processing several files at once
        • Have a go hero building lists of products to buy
        • Have a go hero e-mail students to let them know how they did
      • Summary
    • 12. Developing and Implementing a Simple Datamart
      • Exploring the sales datamart
        • Deciding the level of granularity
      • Loading the dimensions
      • Time for action loading dimensions for the sales datamart
        • What just happened?
      • Extending the sales datamart model
        • Have a go hero loading the dimensions for the puzzles star model
      • Loading a fact table with aggregated data
      • Time for action loading the sales fact table by looking up dimensions
        • What just happened?
        • Getting the information from the source with SQL queries
        • Translating the business keys into surrogate keys
          • Obtaining the surrogate key for a Type I SCD
          • Obtaining the surrogate key for a Type II SCD
          • Obtaining the surrogate key for the Junk dimension
          • Obtaining the surrogate key for the Time dimension
        • Pop quiz modifying a star model and loading the star with PDI
        • Have a go hero loading a puzzles fact table
      • Getting facts and dimensions together
      • Time for action loading the fact table using a range of dates obtained from the command line
        • What just happened?
      • Time for action loading the sales star
        • What just happened?
        • Have a go hero enhancing the loading process of the sales fact table
        • Have a go hero loading the puzzles sales star
        • Have a go hero loading the facts once a month
      • Getting rid of administrative tasks
      • Time for action automating the loading of the sales datamart
        • What just happened?
        • Have a go hero Creating a back up of your work automatically
        • Have a go hero enhancing the automate process by sending an e-mail if an error occurs
      • Summary
    • 13. Taking it Further
      • PDI best practices
      • Getting the most out of PDI
        • Extending Kettle with plugins
        • Have a go hero listing the top 10 students by using the Head plugin step
        • Overcoming real world risks with some remote execution
        • Scaling out to overcome bigger risks
        • Pop quiz remote execution and clustering
      • Integrating PDI and the Pentaho BI suite
        • PDI as a process action
        • PDI as a datasource
        • More about the Pentaho suite
      • PDI Enterprise Edition and Kettle Developer Support
      • Summary
    • A. Working with Repositories
      • Creating a repository
      • Time for action creating a PDI repository
        • What just happened?
        • Creating repositories to store your transformationand jobs
      • Working with the repository storage system
      • Time for action logging into a repository
        • What just happened?
        • Logging into a repository by using credentials
          • Defining repository user accounts
        • Creating transformations and jobs in repository folders
        • Creating database connections, partitions, servers, and clusters
        • Backing up and restoring a repository
      • Examining and modifying the contents of a repository with the Repository explorer
      • Migrating from a file-based system to a repository-based system and vice-versa
      • Summary
    • B. Pan and Kitchen: Launching Transformations and Jobs from the Command Line
      • Running transformations and jobs stored in files
      • Running transformations and jobs from a repository
        • Specifying command line options
      • Checking the exit code
      • Providing options when running Pan and Kitchen
        • Log details
        • Named parameters
        • Arguments
        • Variables
    • C. Quick Reference: Steps and Job Entries
      • Transformation steps
      • Job entries
    • D. Spoon Shortcuts
      • General shortcuts
      • Designing transformations and jobs
      • Grids
      • Repositories
    • E. Introducing PDI 4 Features
      • Agile BI
      • Visual improvements for designing transformations and jobs
        • Experiencing the mouse-over assistance
      • Time for action creating a hop with the mouse-over assistance
        • What just happened?
          • Using the mouse-over assistance toolbar
        • Experiencing the sniff-testing feature
        • Experiencing the job drill-down feature
        • Experiencing even more visual changes
      • Enterprise features
      • Summary
    • F. Pop Quiz Answers
      • Chapter 1
        • PDI data sources
        • PDI prerequisites
        • PDI basics
      • Chapter 2
        • formatting data
      • Chapter 3
        • concatenating strings
      • Chapter 4
        • data movement (copying and distributing)
        • splitting a stream
      • Chapter 5
        • finding the seven errors
      • Chapter 6
        • using Kettle variables inside transformations
      • Chapter 7
        • PDI error handling
      • Chapter 8
        • defining database connections
        • database datatypes versus PDI datatypes
        • Insert/Update step versus Table Output/Update steps
        • filtering the first 10 rows
      • Chapter 9
        • loading slowly changing dimensions
        • loading type III slowly changing dimensions
      • Chapter 10
        • defining PDI jobs
      • Chapter 11
        • using the Add sequence step
        • deciding the scope of variables
      • Chapter 12
        • modifying a star model and loading the star with PDI
      • Chapter 13
        • remote execution and clustering
    • Index

Dodaj do koszyka Pentaho 3.2 Data Integration: Beginner's Guide. Explore, transform, validate, and integrate your data with ease

Code, Publish & WebDesing by CATALIST.com.pl



(c) 2005-2025 CATALIST agencja interaktywna, znaki firmowe należą do wydawnictwa Helion S.A.