reklama - zainteresowany?

AWS Certified Data Engineer Associate Study Guide. In-Depth Guidance and Practice - Helion

AWS Certified Data Engineer Associate Study Guide. In-Depth Guidance and Practice
ebook
Autor: Sakti Mishra, Dylan Qu, Anusha Challa
ISBN: 9781098170035
stron: 476, Format: ebook
Data wydania: 2025-08-25
Księgarnia: Helion

Cena książki: 169,14 zł (poprzednio: 198,99 zł)
Oszczędzasz: 15% (-29,85 zł)

Dodaj do koszyka AWS Certified Data Engineer Associate Study Guide. In-Depth Guidance and Practice

There's no better time to become a data engineer. And acing the AWS Certified Data Engineer Associate (DEA-C01) exam will help you tackle the demands of modern data engineering and secure your place in the technology-driven future.

Authors Sakti Mishra, Dylan Qu, and Anusha Challa equip you with the knowledge and sought-after skills necessary to effectively manage data and excel in your career. Whether you're a data engineer, data analyst, or machine learning engineer, you'll discover in-depth guidance, practical exercises, sample questions, and expert advice you need to leverage AWS services effectively and achieve certification. By reading, you'll learn how to:

  • Ingest, transform, and orchestrate data pipelines effectively
  • Select the ideal data store, design efficient data models, and manage data lifecycles
  • Analyze data rigorously and maintain high data quality standards
  • Implement robust authentication, authorization, and data governance protocols
  • Prepare thoroughly for the DEA-C01 exam with targeted strategies and practices

Dodaj do koszyka AWS Certified Data Engineer Associate Study Guide. In-Depth Guidance and Practice

 

Osoby które kupowały "AWS Certified Data Engineer Associate Study Guide. In-Depth Guidance and Practice", wybierały także:

  • Cisco CCNA 200-301. Kurs video. Podstawy sieci komputerowych i konfiguracji. Część 1
  • Cisco CCNP Enterprise 350-401 ENCOR. Kurs video. Sieci przedsi
  • Jak zhakowa
  • Windows Media Center. Domowe centrum rozrywki
  • Deep Web bez tajemnic. Kurs video. Pozyskiwanie ukrytych danych

Dodaj do koszyka AWS Certified Data Engineer Associate Study Guide. In-Depth Guidance and Practice

Spis treści

AWS Certified Data Engineer Associate Study Guide. In-Depth Guidance and Practice eBook -- spis treści

  • Preface
    • What This Book Isnt
    • What This Book Is About
    • Who Should Read This Book
    • How This Book Is Organized
    • Accessing the Books Images Online
    • Conventions Used in This Book
    • OReilly Online Learning
    • How to Contact Us
    • Acknowledgments
  • 1. Certification Essentials
    • Who Is a Data Engineer?
    • Becoming an AWS Data Engineer Associate
    • Exam Topics
    • Exam Format
    • Registering for the Exam
    • Exam-Style Questions
    • Think Like an AWS Solutions Architect: Translating a Real-World Problem-Solving Framework into Certification
      • The Solutions Architects Problem-Solving Framework
      • Real-World Example: Designing a Serverless Stream Analytics Platform to Detect Fraud
      • How This Thought Process Applies to Certification Questions
    • Study Plan
    • Conclusion
  • 2. Prerequisite Knowledge for Aspiring Data Engineers
    • Databases and Types of Databases
      • What Is a Database?
      • What Is a Database Management System?
    • Types of Databases
      • Hierarchical Databases
      • Relational Databases
      • NoSQL Databases
    • OLTP Versus OLAP
    • Overview of Big Data
    • Distributed Processing Frameworks for Big Data
      • MapReduce
      • Spark
      • Flink
      • Hive
      • Presto
      • Trino
    • What Is a Data Lake?
    • What Is a Data Warehouse?
    • Data Warehouse Versus Data Lake
    • ETL Versus ELT
    • Different Ways to Process Data
      • Batch Processing Pipeline
      • Real-Time Stream Processing
      • Event-Driven Processing
    • High-Level Architecture Overview of Data Processing Pipelines
    • Working with Code Repositories
      • What Is a Code Repository?
      • How to Work with Code Repositories
    • CI/CD
    • Cloud Computing and AWS
    • What Is Cloud Computing?
    • An Overview of Amazon Web Services
    • Getting Started with AWS
      • How to Set Up an AWS Account
      • Configure Access with AWS IAM
      • Create an IAM User for Authentication
      • Add Permissions to Authorize the User
      • What Is an IAM Policy?
      • What Is an IAM Role?
      • Best Practices to Follow with AWS IAM
    • Conclusion
    • Resources
  • 3. Overview of AWS Analytics and Auxiliary Services
    • AWS Analytics Services
      • Amazon Kinesis Data Streams
      • Amazon Data Firehose
      • Amazon Managed Service for Apache Flink
      • Amazon Managed Streaming for Apache Kafka
      • Reference Architecture: Streaming Analytics Pattern with Apache Flink and MSK
      • AWS Glue
      • AWS Glue DataBrew
      • Amazon Athena
      • Amazon EMR
      • Amazon Redshift
      • Amazon QuickSight
      • Reference Architecture: Lakehouse with Glue, Redshift, and Athena
      • Amazon OpenSearch Service
      • Amazon DataZone
      • AWS Lake Formation
    • Auxiliary Services for Analytics
      • Application Integration
      • Compute and Containers
      • Database
      • Storage
      • Machine Learning
      • Migration and Transfer
      • Networking and Content Delivery
      • Security, Identity, and Compliance
      • Management Governance
      • Developer Tools
      • Cloud Financial Management
      • AWS Well-Architected Tool
    • Conclusion
    • Additional Resources
  • 4. Data Ingestion and Transformation
    • Data Ingestion
    • Real-Time Streaming Data Ingestion
      • Kinesis Data Streams Versus Amazon MSK
      • Sample Streaming Ingestion Use Cases
        • Ingesting streaming data from IoT devices into a data lake
        • Ingesting click streams into a data warehouse for real-time reporting
        • Streaming Amazon DynamoDB data into a centralized data lake
        • Ingesting AWS logs into log analytics solutions
    • Ingesting Data Using Zero-ETL Integrations
    • Ingesting Data from Databases with CDC Using AWS Data Migration Service
      • Supported Sources for AWS DMS
      • Supported Targets for AWS DMS
      • Sample Use Cases
        • Ingesting data into an Amazon S3 data lake using DMS
        • Ingesting data into Amazon Redshift using DMS
        • Converting schema using DMS Schema Conversion
        • Ingesting files from on premises
        • Ingesting third-party datasets
    • Best Practices for Data Ingestion
      • Best Practices for Streaming Ingestion
      • Best Practices for Choosing Data Stream Capacity Mode
      • Best Practices for Sharding
      • Best Practices for Consuming Data from KDS
      • Best Practices for Amazon MSK
        • Amazon MSK provisioned cluster versus serverless
        • Amazon MSK serverless cluster
        • General practices when using Amazon MSK
      • Best Practices for Amazon Data Firehose
      • Best Practices for AWS DMS Replication Instances and Tasks
      • Best Practices for AWS DMS Tasks with Amazon Redshift Target
    • Data Transformation
      • Batch Data Transformation
      • Streaming Data Transformation
    • Data Transformation Using AWS Glue
      • Glue Connectors
      • Glue Bookmarks
      • Data Processing Units
      • Worker Type
      • Glue Jobs
      • Data Sources and Destinations
        • Glue Studio
        • Glue Studio notebooks
        • AWS Glue interactive sessions
      • Best Practices for AWS Glue
    • Data Transformation Using Amazon EMR
      • Storage
      • Deployment Options
      • Instance Types
      • Best Practices for Amazon EMR
    • AWS Glue Versus Amazon EMR Options
    • SQL-Based Data Transformation Using Amazon Redshift
      • Amazon Redshift Compute
      • Amazon Redshift Storage
      • SQL Data Transformations
        • Amazon Redshift materialized views
        • Amazon Redshift stored procedures
    • Amazon Managed Service for Apache Flink
    • Amazon Data Firehose for Transformation
    • AWS Lambda for Transformation
    • Choosing the Right Streaming Transformation Service
    • Choosing the Right Batch Transformation Service
    • Data Preparation for Nontechnical Personas
      • Fill Missing Values
      • Identify Duplicate Records
      • Formatting Functions
      • Integrating Data from Multiple Sources
      • Nesting and Unnesting Data Structures
      • Protecting Sensitive Data
      • Other Data Preparation Transformations
    • Orchestrating Data Pipelines
      • AWS Step Functions
      • Managed Workflows for Apache Airflow
      • Sample Use Case
      • AWS Glue Workflows
      • Sample Use Case
      • Amazon Redshift Scheduler
      • Amazon EventBridge
      • Sample Use Case
      • Choosing the Right Orchestration Service
    • Conclusion
    • Practice Questions
    • Additional Resources
  • 5. Data Store Management
    • Choosing a Data Store
      • AWS Core Storage Services
      • AWS Cloud Databases
    • Data Storage Formats for Data Lakes
      • Row-Based File Formats
      • Column-Based File Formats
      • Table Formats
    • Building a Data Strategy with Multiple Data Stores
    • Data Cataloging Systems
      • Components of Metadata and Data Catalogs
      • Populating an AWS Glue Data Catalog
        • Using Glue crawlers
        • Defining metadata manually
        • Integrating with other AWS services
        • Migrating from an existing Hive catalog
      • Data Catalog Best Practices
        • Establish a consistent naming convention
        • Secure the Data Catalog
        • Manage schema changes effectively
        • Monitor schema changes
        • Use crawlers effectively
        • Optimize performance with Glue Data Catalog
      • Enriching Data Catalogs with Data Classification
    • Managing the Lifecycle of Data
      • Selecting Storage Solutions for Hot and Cold Data
      • Example: Building a Petabyte-Scale Log Analytics Solution on AWS
      • Storage Tier Decisions for Different Access Patterns
      • Defining Data Retention Policy and Archiving Strategies
      • Performing COPY and UNLOAD Operations to Move Data Between Amazon S3 and Amazon Redshift
    • Optimizing Data Management with Amazon S3
      • Overview of S3 Storage Classes
        • Frequently accessed storage classes
        • Infrequently accessed storage classes
        • Rarely accessed storage classes
        • Storage class for changing or unknown access patterns
      • Choosing the Right Storage Class
      • S3 Intelligent-Tiering
      • Managing the Data Lifecycle with Amazon S3 Lifecycle
      • Monitoring the Amazon S3 Data Lifecycle
        • S3 Storage Lens
        • Storage Class Analysis
        • AWS Cost Explorer
      • Expiring Snapshots from Open Table Formats
      • Archiving Data from Amazon DynamoDB to Amazon S3
      • Ensuring S3 Data Resiliency with S3 Versioning
      • Enabling Versioning on an S3 Bucket
      • S3 Versioning and Object Lifecycle Management
    • Designing Data Models and Schema
      • Introduction to Data Modeling
      • Data Modeling Strategies for Amazon Redshift
        • Common schema design patterns
        • Logical data modeling in Amazon Redshift
        • Physical data modeling in Amazon Redshift: Choosing the best distribution style
        • Physical data modeling in Amazon Redshift: Choosing the best sort key
        • Additional best practices for data modeling with Amazon Redshift
      • Data Modeling Strategies for Amazon DynamoDB
        • NoSQL versus relational data modeling
        • Example use case: Ecommerce website
        • Core concepts of DynamoDB
        • Selecting the right partition key
        • Selecting the right sort key
        • Utilizing global secondary indexes and local secondary indexes
        • Common use cases and considerations
      • Data Modeling Strategies for Data Lakes
        • Raw data layer: The landing zone for raw data
        • Stage data layer: Cleansed and conformed data
        • Analytics data layer: Curated and aggregated data
      • Amazon S3 Data Lake Best Practices
        • Partition your data
        • Bucket your data
        • Use compression
        • Optimize file size
        • Use columnar file formats
        • Use open table formats
    • Conclusion
    • Practice Questions
    • Additional Resources
  • 6. Data Operations and Support
    • Amazon QuickSight
      • Data Sources
      • Datasets
      • Refreshing SPICE Datasets
      • Visualizations
      • Presentation Formats
      • QuickSight GenBI Capabilities (QuickSight Q)
        • Generate stories
        • Create executive summaries
        • Enhanced dashboard Q&A
    • SQL Analytics Using Amazon Athena
      • Choice of Querying Engine
        • Trino SQL
        • Spark SQL/PySpark
      • Workgroups
      • Capacity Reservations
      • Athena Federated SQL
      • Use Cases
      • DDL Capabilities
      • Best Practices When Using Amazon Athena
    • SQL Analytics Using Amazon Redshift
      • SQL Functions
      • Semi-Structured Data Analysis
      • Geospatial Data Analysis
      • Query Data from Data Lake
      • Analyzing Data from Operational Data Stores Using Amazon Redshift
      • Redshift ML and Generative AI
      • User-Defined Functions
    • Analyzing Data Using Notebooks
      • AWS Glue Interactive Sessions
      • Amazon EMR Notebooks
    • Data Pipeline Resiliency
      • Monitoring
        • Monitoring metrics using CloudWatch
        • CloudWatch dashboards
        • Monitoring API calls with CloudTrail
        • Monitoring logs and traces
        • Monitoring using system tables
      • Alerting
        • CloudWatch Alarms
        • Alarm state
        • Notifications
      • Event-Driven Pipeline Maintenance with EventBridge
      • Ensuring Data Quality and Reliability: Deequ and DQDL
        • AWS Glue Data Quality
        • AWS Glue Data Quality DQDL syntax
        • Composite rules
        • Using Deequ with Amazon EMR
      • Automated Data Quality Checks and Error Handling
      • Troubleshooting and Performance Tuning
        • Connection timed out errors
        • Access denied exceptions
        • Throttling errors
        • Resource constraints
      • CI/CD Pipelines
        • Continuous integration (CI)
        • Continuous deployment (CD)
      • Version Control and Collaboration
      • Infrastructure as Code
        • AWS CloudFormation
        • AWS Serverless Application Model
        • AWS Cloud Development Kit (AWS CDK)
        • Choosing the right IaC solution
      • Disaster Recovery and High Availability
        • HA for Amazon EMR clusters on EC2
        • HA for Amazon Redshift provisioned clusters
        • Availability Zone (AZ) failure recovery
        • Backup and restore
        • Region failure recovery
        • HA for Amazon MSK
        • HA for Amazon OpenSearch
    • Cost Optimization for Data Pipelines
      • Leveraging Serverless Services
      • Autoscaling
      • Tiered Storage
      • Columnar Formats
      • Monitor and Control Data Transfer Costs
      • Follow Cost Optimization Best Practices
    • Conclusion
    • Practice Questions
    • Additional Resources
  • 7. Data Security and Governance
    • Network Security
      • Amazon VPC Overview
      • Security Groups Overview
      • Best Practices for Configuring Security Groups for Your Workloads
      • Configuring a VPC and Security Group for an Amazon EMR Cluster
      • Managed Services Versus Unmanaged Services
      • VPC Endpoints Overview
        • Redshift-managed VPC endpoints
        • OpenSearch Servicemanaged VPC endpoints
    • User Authentication and Authorization
      • Authenticating Users with IAM Credentials
      • IAM Role-Based Authentication and Authorization
      • Service-Linked Roles
      • Managed Versus Self-Managed Policies
      • Enable Single Sign-on with AWS IAM Identity Center
        • IAM Identity Center integration with AWS Lake Formation
        • IAM Identity Center integration with Amazon DataZone
    • Data Security and Privacy
      • Secure Data in Amazon S3
      • Manage Database Credentials
      • Data Encryption and Decryption and Managing the Encryption Keys
      • Managing Encryption Keys with AWS KMS
        • Enabling encryption and managing keys in AWS
        • Best practices for managing keys with AWS KMS
      • Enabling Encryption in AWS Analytics Services
        • AWS Glue
        • Amazon EMR
        • Amazon Redshift
      • Sensitive Data Detection and Redaction
        • Integrating Amazon Macie for data at rest
        • Integrating AWS Glue sensitive data detection
      • Fine-Grained Access Control with AWS Lake Formation
        • Register the data lake location
        • Granting permission to Glue Data Catalog databases, tables, and views
        • Name-based access control
        • Tag-based access control
        • Row- and column-based data filtering
        • Best practices to integrate AWS Lake Formation
        • Best practices for cross-account sharing
        • Best practices for tag-based access control
      • Database Security in Amazon Redshift
        • Manage permissions with GRANT and REVOKE
        • Role-based access control
        • Row-level security
        • Dynamic data masking
      • Fine-Grained Access Control in Amazon QuickSight
        • Access control with IAM policies
        • Access control with Lake Formation
    • Data Governance
      • Metadata Management and Technical Catalog
        • AWS Glue Data Catalog
        • AWS Glue crawler
        • Amazon DataZone business glossary
      • Data Sharing
        • Share within a single AWS account
        • Multiaccount, hub-and-spoke model for data sharing
        • Data mesh with centralized governance
        • Cross-organization or business-to-business data sharing
        • Exposing data as a product in a data marketplace
      • Data Quality
      • Data Profiling
      • Data Lifecycle Management
      • Data Lineage
        • Amazon DataZone
        • Building lineage solutions with AWS Glue, Amazon Neptune, and Spline
        • Amazon SageMaker ML Lineage Tracking
      • Logging and Auditing
        • Amazon CloudWatch
        • Amazon OpenSearch Service
        • Amazon S3
        • Logging and auditing in Amazon Redshift
        • Amazon Managed Service for Prometheus and Grafana
        • AWS CloudTrail to audit actions or API invocations
        • Analyzing CloudTrail logs using CloudTrail Lake
      • Analyzing Logs Using AWS Services
        • Amazon Athena
        • Amazon CloudWatch Log Insights
        • AWS CloudTrail Insights
        • Amazon OpenSearch Dashboards
        • Processing logs with Amazon EMR or AWS Glue
        • Auditing AWS configuration changes with AWS Config
    • Conclusion
    • Practice Questions
    • Additional Resources
  • 8. Implementing Batch and Streaming Pipelines
    • Data Processing Pipeline
    • Implementing a Batch Processing Pipeline
      • Use Case and Architecture Overview
      • Overview of Input Dataset
      • Step-by-Step Implementation Guide
        • Create Amazon S3 buckets
        • Create Amazon Redshift cluster
        • Create Glue data connection for the Redshift cluster
        • Create AWS Glue PySpark ETL job
        • Create Amazon QuickSight execution role using AWS IAM
        • Sign up for and manage Amazon QuickSight
        • Create Amazon QuickSight visualization
      • Best Practices and Optimization Techniques
    • Implementing a Real-Time Streaming Pipeline
      • Use Case and Architecture Overview
      • Step-by-Step Implementation Guide
        • Creating a Kinesis data stream
        • Setting up Amazon Kinesis Data Generator
        • Create Amazon S3 buckets for an Iceberg data lake and a streaming checkpoint
        • Creating an EMR Studio and EMR Serverless application
        • Creating VPC endpoints for Kinesis Data Streams, Amazon S3, and EMR Serverless
        • Submitting the Spark Streaming job to the EMR Serverless application
    • Conclusion
    • Resources
  • 9. Practice Exam
  • 10. Whats New in AWS for Data Engineers
    • Amazon SageMaker Unified Studio
    • Amazon SageMaker Catalog
    • Amazon SageMaker Lakehouse
    • Amazon SageMaker AI
    • Amazon S3 Tables
    • Amazon S3 Metadata
    • Improving the Developer Experience with Generative AI
      • Generative AIPowered Code Generation with Amazon Q Developer
      • Automated Script Upgrade in AWS Glue
      • GenAI-Powered Troubleshooting for Spark in AWS Glue
    • Conclusion
    • Resources
  • A. Solutions to the Practice Questions
    • Chapter 4
    • Chapter 5
    • Chapter 6
    • Chapter 7
    • Chapter 9
  • Index

Dodaj do koszyka AWS Certified Data Engineer Associate Study Guide. In-Depth Guidance and Practice

Code, Publish & WebDesing by CATALIST.com.pl



(c) 2005-2025 CATALIST agencja interaktywna, znaki firmowe należą do wydawnictwa Helion S.A.