Deciphering Data Architectures - Helion
ISBN: 9781098150723
stron: 278, Format: ebook
Data wydania: 2024-02-06
Księgarnia: Helion
Cena książki: 29,90 zł (poprzednio: 299,00 zł)
Oszczędzasz: 90% (-269,10 zł)
Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they're also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of these architectures to help data professionals understand the pros and cons of each.
James Serra, big data and data warehousing solution architect at Microsoft, examines common data architecture concepts, including how data warehouses have had to evolve to work with data lake features. You'll learn what data lakehouses can help you achieve, as well as how to distinguish data mesh hype from reality. Best of all, you'll be able to determine the most appropriate data architecture for your needs. With this book, you'll:
- Gain a working understanding of several data architectures
- Learn the strengths and weaknesses of each approach
- Distinguish data architecture theory from reality
- Pick the best architecture for your use case
- Understand the differences between data warehouses and data lakes
- Learn common data architecture concepts to help you build better solutions
- Explore the historical evolution and characteristics of data architectures
- Learn essentials of running an architecture design session, team organization, and project success factors
Free from product discussions, this book will serve as a timeless resource for years to come.
Osoby które kupowały "Deciphering Data Architectures", wybierały także:
- Cisco CCNA 200-301. Kurs video. Administrowanie bezpieczeństwem sieci. Część 3 665,00 zł, (39,90 zł -94%)
- Cisco CCNA 200-301. Kurs video. Administrowanie urządzeniami Cisco. Część 2 665,00 zł, (39,90 zł -94%)
- Cisco CCNA 200-301. Kurs video. Podstawy sieci komputerowych i konfiguracji. Część 1 665,00 zł, (39,90 zł -94%)
- Impact of P2P and Free Distribution on Book Sales 427,14 zł, (29,90 zł -93%)
- Cisco CCNP Enterprise 350-401 ENCOR. Kurs video. Programowanie i automatyzacja sieci 443,33 zł, (39,90 zł -91%)
Spis treści
Deciphering Data Architectures eBook -- spis treści
- Foreword
- Preface
- Conventions Used in This Book
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. Foundation
- 1. Big Data
- What Is Big Data, and How Can It Help You?
- Data Maturity
- Stage 1: Reactive
- Stage 2: Informative
- Stage 3: Predictive
- Stage 4: Transformative
- Self-Service Business Intelligence
- Summary
- 2. Types of Data Architectures
- Evolution of Data Architectures
- Relational Data Warehouse
- Data Lake
- Modern Data Warehouse
- Data Fabric
- Data Lakehouse
- Data Mesh
- Summary
- 3. The Architecture Design Session
- What Is an ADS?
- Why Hold an ADS?
- Before the ADS
- Preparing
- Inviting Participants
- Conducting the ADS
- Introductions
- Discovery
- Whiteboarding
- After the ADS
- Tips for Conducting an ADS
- Summary
- II. Common Data Architecture Concepts
- 4. The Relational Data Warehouse
- What Is a Relational Data Warehouse?
- What a Data Warehouse Is Not
- The Top-Down Approach
- Why Use a Relational Data Warehouse?
- Drawbacks to Using a Relational Data Warehouse
- Populating a Data Warehouse
- How Often to Extract the Data
- Extraction Methods
- How to Determine What Data Has Changed Since the Last Extraction
- The Death of the Relational Data Warehouse Has Been Greatly Exaggerated
- Summary
- 5. Data Lake
- What Is a Data Lake?
- Why Use a Data Lake?
- Bottom-Up Approach
- Best Practices for Data Lake Design
- Multiple Data Lakes
- Advantages
- Organizational structure and ownership
- Compliance, governance, and security
- Cloud subscription, service limits, and policies
- Performance, availability, and disaster recovery
- Data retention and environment management
- Disadvantages
- Advantages
- Summary
- 6. Data Storage Solutions and Processes
- Data Storage Solutions
- Data Marts
- Operational Data Stores
- Use case
- Data Hubs
- Data Processes
- Master Data Management
- Use case
- Data Virtualization and Data Federation
- Virtualization as a replacement for the data warehouse
- Virtualization as a replacement for ETL or data movement
- Data Catalogs
- Data Marketplaces
- Master Data Management
- Summary
- Data Storage Solutions
- 7. Approaches to Design
- Online Transaction Processing Versus Online Analytical Processing
- Operational and Analytical Data
- Symmetric Multiprocessing and Massively Parallel Processing
- Lambda Architecture
- Kappa Architecture
- Polyglot Persistence and Polyglot Data Stores
- Summary
- 8. Approaches to Data Modeling
- Relational Modeling
- Keys
- EntityRelationship Diagrams
- Normalization Rules and Forms
- Tracking Changes
- Dimensional Modeling
- Facts, Dimensions, and Keys
- Tracking Changes
- Denormalization
- Common Data Model
- Data Vault
- The Kimball and Inmon Data Warehousing Methodologies
- Inmons Top-Down Methodology
- Kimballs Bottom-Up Methodology
- Choosing a Methodology
- Hybrid Models
- Methodology Myths
- Summary
- Relational Modeling
- 9. Approaches to Data Ingestion
- ETL Versus ELT
- Reverse ETL
- Batch Processing Versus Real-Time Processing
- Batch Processing Pros and Cons
- Real-Time Processing Pros and Cons
- Data Governance
- Summary
- III. Data Architectures
- 10. The Modern Data Warehouse
- The MDW Architecture
- Pros and Cons of the MDW Architecture
- Combining the RDW and Data Lake
- Data Lake
- Relational Data Warehouse
- Stepping Stones to the MDW
- EDW Augmentation
- How it works
- Benefits
- Challenges
- Migration
- Temporary Data Lake Plus EDW
- How it works
- Benefits
- Challenges
- Migration
- All-in-One
- How it works
- Benefits
- Challenges
- Migration
- EDW Augmentation
- Case Study: Wilson & Gunkerks Strategic Shift to an MDW
- Challenge
- Solution
- Outcome
- Summary
- 11. Data Fabric
- The Data Fabric Architecture
- Data Access Policies
- Metadata Catalog
- Master Data Management
- Data Virtualization
- Real-Time Processing
- APIs
- Services
- Products
- Why Transition from an MDW to a Data Fabric Architecture?
- Potential Drawbacks
- Summary
- The Data Fabric Architecture
- 12. Data Lakehouse
- Delta Lake Features
- Performance Improvements
- The Data Lakehouse Architecture
- What If You Skip the Relational Data Warehouse?
- Relational Serving Layer
- Summary
- 13. Data Mesh Foundation
- A Decentralized Data Architecture
- Data Mesh Hype
- Dehghanis Four Principles of Data Mesh
- Principle #1: Domain Ownership
- Principle #2: Data as a Product
- Principle #3: Self-Serve Data Infrastructure as a Platform
- Principle #4: Federated Computational Governance
- The Pure Data Mesh
- Data Domains
- Data Mesh Logical Architecture
- Different Topologies
- Data Mesh Versus Data Fabric
- Use Cases
- Summary
- 14. Should You Adopt Data Mesh? Myths, Concerns, and the Future
- Myths
- Myth: Using Data Mesh Is a Silver Bullet That Solves All Data Challenges Quickly
- Myth: A Data Mesh Will Replace Your Data Lake and Data Warehouse
- Myth: Data Warehouse Projects Are All Failing, and a Data Mesh Will Solve That Problem
- Myth: Building a Data Mesh Means Decentralizing Absolutely Everything
- Myth: You Can Use Data Virtualization to Create a Data Mesh
- Concerns
- Philosophical and Conceptual Matters
- Combining Data in a Decentralized Environment
- Other Issues of Decentralization
- Complexity
- Duplication
- Feasibility
- People
- Domain-Level Barriers
- Organizational Assessment: Should You Adopt a Data Mesh?
- Recommendations for Implementing a Successful Data Mesh
- The Future of Data Mesh
- Zooming Out: Understanding Data Architectures and Their Applications
- Summary
- Myths
- IV. People, Processes, and Technology
- 15. People and Processes
- Team Organization: Roles and Responsibilities
- Roles for MDW, Data Fabric, or Data Lakehouse
- Roles for Data Mesh
- Domain teams
- Self-service data infrastructure platform team
- Federated computational governance platform team
- Why Projects Fail: Pitfalls and Prevention
- Pitfall: Allowing Executives to Think That BI Is Easy
- Pitfall: Using the Wrong Technologies
- Pitfall: Gathering Too Many Business Requirements
- Pitfall: Gathering Too Few Business Requirements
- Pitfall: Presenting Reports Without Validating Their Contents First
- Pitfall: Hiring an Inexperienced Consulting Company
- Pitfall: Hiring a Consulting Company That Outsources Development to Offshore Workers
- Pitfall: Passing Project Ownership Off to Consultants
- Pitfall: Neglecting the Need to Transfer Knowledge Back into the Organization
- Pitfall: Slashing the Budget Midway Through the Project
- Pitfall: Starting with an End Date and Working Backward
- Pitfall: Structuring the Data Warehouse to Reflect the Source Data Rather Than the Businesss Needs
- Pitfall: Presenting End Users with a Solution with Slow Response Times or Other Performance Issues
- Pitfall: Overdesigning (or Underdesigning) Your Data Architecture
- Pitfall: Poor Communication Between IT and the Business Domains
- Tips for Success
- Dont Skimp on Your Investment
- Involve Users, Show Them Results, and Get Them Excited
- Add Value to New Reports and Dashboards
- Ask End Users to Build a Prototype
- Find a Project Champion/Sponsor
- Make a Project Plan That Aims for 80% Efficiency
- Summary
- Team Organization: Roles and Responsibilities
- 16. Technologies
- Choosing a Platform
- Open Source Solutions
- On-Premises Solutions
- Cloud Provider Solutions
- Cloud Service Models
- Major Cloud Providers
- Multi-Cloud Solutions
- Software Frameworks
- Hadoop
- Databricks
- Snowflake
- Summary
- Choosing a Platform
- Index