Amazon Redshift: The Definitive Guide - Helion

ebook

Autor: Rajesh Francis, Rajiv Gupta, Milind Oke
ISBN: 9781098135263
stron: 458, Format: ebook
Data wydania: 2023-10-03
Księgarnia: Helion

Cena książki: 228,65 zł (poprzednio: 265,87 zł)
Oszczędzasz: 14% (-37,22 zł)

Osoby, które kupiły tę książkę, wybierały także »

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse.

Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift.

By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you:

Build a cloud data strategy around Amazon Redshift as foundational data warehouse
Get started with Amazon Redshift with simple-to-use data models and design best practices
Understand how and when to use Redshift Serverless and Redshift provisioned clusters
Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options
Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing
Learn best practices for security, monitoring, resilience, and disaster recovery
Leverage Amazon Redshift integration with other AWS services to unlock additional value

Osoby które kupowały "Amazon Redshift: The Definitive Guide", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
React.js i Node.js. Kurs video. Budowanie serwisu w oparciu o popularne biblioteki języka JavaScript 128,46 zł, (16,70 zł -87%)
Angular instalacja i działanie 76,15 zł, (9,90 zł -87%)
Instalacja i konfiguracja baz danych. Kurs video. Przygotowanie do egzaminu 70-765 Provisioning SQL Databases 285,00 zł, (39,90 zł -86%)

Spis treści

Amazon Redshift: The Definitive Guide eBook -- spis treści

Foreword
Preface
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
1. AWS for Data
- Data-Driven Organizations
  - Business Use Cases
  - New Business Use Cases with Generative AI
- Modern Data Strategy
  - Comprehensive Set of Capabilities
  - Integrated Set of Tools
  - End-to-End Data Governance
- Modern Data Architecture
  - Role of Amazon Redshift in a Modern Data Architecture
  - Real-World Benefits of Adopting a Modern Data Architecture
  - Reference Architecture for Modern Data Architecture
  - Data Sourcing
  - Extract, Transform, and Load
  - Storage
    - Storage in the data warehouse
    - Storage in the data lake
  - Analysis
    - Comparing transactional databases, data warehouses, and data lakes
- Data Mesh and Data Fabric
  - Data Mesh
  - Data Fabric
- Summary
2. Getting Started with Amazon Redshift
- Amazon Redshift Architecture Overview
- Get Started with Amazon Redshift Serverless
  - Creating an Amazon Redshift Serverless Data Warehouse
- Sample Data
  - Activate Sample Data Models and Query Using the Query Editor
- When to Use a Provisioned Cluster?
  - Creating an Amazon Redshift Provisioned Cluster
- Estimate Your Amazon Redshift Cost
  - Amazon Redshift Managed Storage
  - Amazon Redshift Serverless Compute Cost
    - Setting a different value for the base capacity
    - High/frequent usage
  - Amazon Redshift Provisioned Compute Cost
- AWS Account Management
- Connecting to Your Amazon Redshift Data Warehouse
  - Private/Public VPC and Secure Access
  - Stored Password
  - Temporary Credentials
  - Federated User
  - SAML-Based Authentication from an Identity Provider
  - Native IdP Integration
  - Amazon Redshift Data API
  - Querying a Database Using the Query Editor V2
    - Federated user
    - Temporary credentials
    - Database username and password
    - AWS Secrets Manager
  - Business Intelligence Using Amazon QuickSight
  - Connecting to Amazon Redshift Using JDBC/ODBC
- Summary
3. Setting Up Your Data Models and Ingesting Data
- Data Lake First Versus Data Warehouse First Strategy
  - Data Lake First Strategy
  - Data Warehouse First Strategy
  - Deciding On a Strategy
- Defining Your Data Model
  - Database Schemas, Users, and Groups
  - Star Schema, Denormalized, Normalized
- Student Information Learning Analytics Dataset
  - Create Data Models for Student Information Learning Analytics Dataset
- Load Batch Data into Amazon Redshift
  - Using the COPY Command
  - Ingest Data for the Student Learning Analytics Dataset
  - Building a Star Schema
  - Continuous File Ingestion from Amazon S3
  - Using AWS Glue for Transformations
  - Manual Loading Using SQL Commands
  - Using the Query Editor V2
- Load Real-Time and Near Real-Time Data
  - Near Real-Time Replication Using AWS Database Migration Service
  - Amazon Aurora Zero-ETL Integration with Amazon Redshift
  - Using Amazon AppFlow
  - Streaming Ingestion
    - Steps to get started with streaming ingestion
    - Important considerations and best practices
- Optimize Your Data Structures
  - Automatic Table Optimization and Autonomics
  - Distribution Style
  - Sort Key
  - Compression Encoding
- Summary
4. Data Transformation Strategies
- Comparing ELT and ETL Strategies
- In-Database Transformation
  - Semistructured Data
  - User-Defined Functions
  - Stored Procedures
- Scheduling and Orchestration
- Access All Your Data
  - External Amazon S3 Data
  - External Operational Data
  - External Amazon Redshift Data
- External Transformation
  - AWS Glue
    - Register Amazon Redshift target connection
    - Build and run your AWS Glue job
- Summary
5. Scaling and Performance Optimizations
- Scale Storage
- Autoscale Your Serverless Data Warehouse
- Scale Your Provisioned Data Warehouse
  - Evolving Compute Demand
    - Predictable workload changes
  - Unpredictable Workload Changes
- WLM, Queues, and QMR
  - Queue Assignment
  - Short Query Acceleration
  - Query Monitoring Rules
  - Automatic WLM
  - Manual WLM
  - Parameter Group
  - WLM Dynamic Memory Allocation
- Materialized Views
- Autonomics
  - Auto Table Optimizer and Smart Defaults
  - Auto Vacuum
  - Auto Vacuum Sort
  - Auto Analyze
  - Auto Materialized Views (AutoMV)
  - Amazon Redshift Advisor
- Workload Isolation
- Additional Optimizations for Achieving the Best Price and Performance
  - Database Versus Data Warehouse
  - Amazon Redshift Serverless
  - Multi-Warehouse Environment
  - AWS Data Exchange
  - Table Design
  - Indexes Versus Zone Maps
  - Drivers
  - Simplify ETL
  - Query Editor V2
- Query Tuning
  - Query Processing
    - Query planning and execution workflow
    - Query stages and system tables
    - Understanding the query plan
    - Factors affecting query performance
  - Analyzing Queries
    - Reviewing query alerts
    - Analyzing the query plan
  - Identifying Queries for Performance Tuning
- Summary
6. Amazon Redshift Machine Learning
- Machine Learning Cycle
- Amazon Redshift ML
  - Amazon Redshift ML Flexibility
  - Getting Started with Amazon Redshift ML
- Machine Learning Techniques
  - Supervised Learning Techniques
  - Unsupervised Learning Techniques
- Machine Learning Algorithms
- Integration with Amazon SageMaker Autopilot
  - Create Model
  - Label Probability
  - Explain Model
- Using Amazon Redshift ML to Predict Student Outcomes
- Amazon SageMaker Integration with Amazon Redshift
- Integration with Amazon SageMakerBring Your Own Model (BYOM)
  - BYOM Local
  - BYOM Remote
- Amazon Redshift ML Costs
- Summary
7. Collaboration with Data Sharing
- Amazon Redshift Data Sharing Overview
- Data Sharing Use Cases
- Key Concepts of Data Sharing
- How to Use Data Sharing
  - Sharing Data Within the Same Account
  - Sharing Data Across Accounts Using Cross-Account Data Sharing
- Analytics as a Service Use Case with Multi-Tenant Storage Patterns
  - Scaling Your Multi-tenant Architecture Using Data Sharing
  - Multi-tenant Storage Patterns Using Data Sharing
    - Pool model
      - Creating database views in the producer
      - Creating datashares in producer and granting usage to the consumer
      - Using Role-Level Security
    - Bridge model
      - Creating database schemas and tables in the producer
      - Creating datashares in the producer and granting usage to the consumer
    - Silo model
      - Creating databases and datashares in the producer
      - Creating datashares in the producer and granting usage to the consumer
- External Data Sharing with AWS ADX Integration
  - Publishing a Data Product
  - Subscribing to a Published Data Product
  - Considerations When Using AWS Data Exchange for Amazon Redshift
- Query from the Data Lake and Unload to the Data Lake
- Amazon DataZone to Discover and Share Data
  - Use Cases for a Data Mesh Architecture with Amazon DataZone
  - Key Capabilities and Use Cases for Amazon DataZone
  - Amazon DataZone Integrations with Amazon Redshift and Other AWS Services
  - Components and Capabilities of Amazon DataZone
    - Business data catalog
    - Projects
    - Data governance and access control
    - Data portal
  - Getting Started with Amazon DataZone
    - Step 1: Create the domain and data portal
    - Step 2: Create a producer project
    - Step 3: Produce data for publishing in Amazon DataZone
    - Step 4: Publish a data product to the catalog
    - Step 5: Create a consumer project
    - Step 6: Discovering and consuming data in Amazon DataZone
    - Step 7: Approve access to a published data asset as a producer
    - Step 8: Analyze a published data asset as a consumer
  - Security in Amazon DataZone
    - Using Lake Formation-based authorization
    - Encryption
    - Implement least privilege access
    - Use IAM roles
- Summary
8. Securing and Governing Data
- Object-Level Access Controls
  - Object Ownership
  - Default Privileges
  - Public Schema and Search Path
  - Access Controls in Action
- Database Roles
  - Database Roles in Action
- Row-Level Security
  - Row-Level Security in Action
  - Row-Level Security Considerations
- Dynamic Data Masking
  - Dynamic Data Masking in Action
  - Dynamic Data Masking Considerations
- External Data Access Control
  - Associate IAM Roles
  - Authorize Assume Role Privileges
  - Establish External Schemas
  - Lake Formation for Fine-Grained Access Control
- Summary
9. Migrating to Amazon Redshift
- Migration Considerations
  - Retire Versus Retain
  - Migration Data Size
  - Platform-Specific Transformations Required
  - Data Volatility and Availability Requirements
  - Selection of Migration and ETL Tools
  - Data Movement Considerations
  - Domain Name System (DNS)
- Migration Strategies
  - One-Step Migration
  - Two-Step Migration
    - Initial data migration
    - Changed data migration
  - Iterative Migration
- Migration Tools and Services
  - AWS Schema Conversion Tool
    - SCT overview
    - SCT migration assessment report
    - SCT data extraction agents
    - Migrating BLOBs to Amazon Redshift
  - Data Warehouse Migration Service
    - How AWS DMS works
    - DMS replication instances
    - DMS replication validation
  - AWS Snow Family
    - AWS Snow Family key features
    - AWS Snow Family devices
  - AWS Snowball Edge Client
- Database Migration Process
  - Step 1: Convert Schema and Subject Area
  - Step 2: Initial Data Extraction and Load
  - Step 3: Incremental Load Through Data Capture
- Amazon Redshift Migration Tools Considerations
- Accelerate Your Migration to Amazon Redshift
  - Macro Conversion
  - Case-Insensitive String Comparison
  - Recursive Common Table Expressions
  - Proprietary Data Types
- Summary
10. Monitoring and Administration
- Amazon Redshift Monitoring Overview
  - Monitoring
  - Troubleshooting
  - Optimization
- Monitoring Using Console
  - Monitoring and Administering Serverless
    - Query and database monitoring serverless
      - Serverless query and database monitoring
      - Serverless query monitoring drill-down query
      - Serverless query monitoring drill-down query plan
      - Serverless query monitoring drill-down related metrics
    - Resource monitoring
  - Monitoring Provisioned Data Warehouse Using Console
    - Data warehouse performance and resource utilization metrics
      - View Performance Data
      - CPU utilization
      - Percentage disk space used
      - Database connections
      - Query duration
      - Query throughput
    - Query and data ingestion performance metrics: Query Monitoring tab
      - Query history at data warehouse level
      - Database performance for queries
    - Workload concurrency
  - Monitoring Queries and Loads Across Clusters
    - Monitoring queries and loads
    - Monitoring top queries
  - Identifying Systemic Query Performance Problems
- Monitoring Using Amazon CloudWatch
  - Amazon Redshift CloudWatch Metrics
- Monitoring Using System Tables and Views
  - Monitoring Serverless Using System Views
- High Availability and Disaster Recovery
  - Recovery Time Objective and Recovery Point Objective Considerations
  - Multi-AZ Compared to Single-AZ Deployment
  - Creating or Converting a Provisioned Data Warehouse with Multi-AZ Configuration
    - Creating a new data warehouse with Multi-AZ option
    - Migrating an existing data warehouse from Single-AZ to Multi-AZ
  - Auto Recovery of Multi-AZ Deployment
- Snapshots, Backup, and Restore
  - Snapshots for Backup
  - Automated Snapshots
  - Manual Snapshots
  - Disaster Recovery Using Cross-Region Snapshots
  - Using Snapshots for Simple-Replay
- Monitoring Amazon Redshift Using CloudTrail
- Bring Your Own Visualization Tool to Monitor Amazon Redshift
  - Monitor Operational Metrics Using System Tables and Amazon QuickSight
  - Monitor Operational Metrics Using Grafana Plug-in for Amazon Redshift
- Summary
Index