Building Medallion Architectures - Helion

ISBN: 9781098178796
stron: 396, Format: ebook
Data wydania: 2025-03-28
Księgarnia: Helion
Cena książki: 177,65 zł (poprzednio: 216,65 zł)
Oszczędzasz: 18% (-39,00 zł)
In today's data-driven world, organizations must manage and analyze vast amounts of information to deliver the insights that give them a competitive advantage. Many turn to the medallion architecture because it's a proven and well-known design. Yet implementing a robust data pipeline can be difficult, particularly when it comes to using the medallion architecture's bronze, silver, and gold layers—done wrong, it can hamper your ability to make data-driven decisions. This practical guide helps you build a medallion architecture the right way with Azure Databricks and Microsoft Fabric.
Drawing on hands-on experience from the field, Piethein Strengholt demystifies common assumptions and complex problems you'll face when embarking on a new data architecture. Architects and engineers of all stripes will find answers to the most typical questions along with insights from real organizations about what's worked, what hasn't, and why.
You'll learn:
- Lakehouse and medallion architecture fundamentals and key concepts
- Design considerations for Azure Databricks and Microsoft Fabric
- Scaling considerations, including governance, security, automation, and more
- How to make informed decisions when designing or implementing new data architectures
- Proven patterns for success that align with broader organizational objectives
Osoby które kupowały "Building Medallion Architectures", wybierały także:
- Power BI Desktop. Kurs video. Wykorzystanie narzędzia w analizie i wizualizacji danych 332,50 zł, (39,90 zł -88%)
- Analiza danych w Tableau. Kurs video. Podstawy pracy analityka 234,71 zł, (39,90 zł -83%)
- Tabele i wykresy przestawne dla ka 190,00 zł, (39,90 zł -79%)
- Power Apps. Kurs video. Tworzenie biznesowych aplikacji no-code 190,00 zł, (39,90 zł -79%)
- Microsoft Excel. Kurs video. Wykresy i wizualizacja danych 190,00 zł, (39,90 zł -79%)
Spis treści
Building Medallion Architectures eBook -- spis treści
- Foreword
- Preface
- Who Should Read This Book
- Navigating This Book
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. Understanding the Medallion Framework
- 1. The Evolution of Data Architecture
- What Is a Medallion Architecture?
- A Brief History of Data Warehouse Architecture
- OLTP Systems
- Data Warehouses
- The Staging Area
- Inmon Methodology
- Kimball Methodology
- Key Takeaways from Traditional Data Warehouses
- A Brief History of Data Lakes
- Hadoops Distributed File System
- MapReduce
- Apache Hive
- External and internal tables
- Hive Metastore
- Spark Project
- Moving Forward with Data Lakes
- A Brief History of Lakehouse Architecture
- Founders of Spark
- Emergence of Open Table Formats
- The Rise of Lakehouse Architectures
- Medallion Architecture and Its Practical Challenges
- Conclusion
- 2. Laying the Groundwork
- Foundational Preconditions
- Extra Landing Zones
- Raw Data
- Batch Processing
- Real-Time Data Processing
- Spark Structured Streaming
- Change Data Feed
- Change Data Capture
- Considerations and Learning Resources
- ETL and Orchestration Tools
- Managing Delta Tables
- Z-Ordering
- V-Ordering
- Table Partitioning
- Liquid Clustering
- Compaction and Optimized Writes
- DeltaLog
- Conclusion
- 3. Demystifying the Medallion Architecture
- The Three-Layered Design
- Bronze Layer
- Processing Hierarchy
- Processing Full Data Loads
- Processing Incremental Data Loads
- Data Historization Within the Bronze Layer
- Schema Evolution and Management
- MergeSchema and Schema Enforcement
- Technical Validation Checks
- Usage and Governance
- The Bronze Layer in Practice
- Silver Layer
- Cleaning Data Activities
- Designing the Silver Layers Data Model
- Conforming and renaming columns
- Denormalization
- Slowly changing dimensions
- Surrogate keys
- Harmonization with Other Sources
- 3NF and Data Vault
- Operational Querying and Machine Learning
- Managing Overlapping Requirements
- Automation Tasks
- The Silver Layer in Practice
- Gold Layer
- Star Schema
- Loading the dimension tables
- Loading the fact tables
- Optimizing loads
- Star Schema Design Nuances
- Curated, Semantic, and Platinum Layers
- One-Big-Table Design
- Serving Layer
- The Gold Layer in Practice
- Star Schema
- Conclusion
- II. Crafting the Medallion Layers
- 4. Building a Medallion Foundation with Microsoft Fabric
- Our Case Study: Oceanic Airlines
- Introducing Microsoft Fabric
- Domains
- Workspaces and Capacities
- OneLake
- Data Engineering with Spark
- Data Warehousing with T-SQL
- Other Fabric Workload Types
- Setting Up the Foundation
- Setting up Capacities
- Setting up Domains
- Setting up Workspaces
- Creating Lakehouses
- Capacity Considerations
- Domain Considerations
- Workspace Considerations
- Lakehouse Entities Considerations
- Storage Account Considerations
- Conclusion
- 5. Construct the Bronze Layer
- Building the Data Pipeline
- Deploying the AdventureWorks Sample Database
- Set Up an Azure SQL Database Connection
- Creating a New Data Pipeline
- Building the ForEach loop
- Configuring the CopyTable activity
- Additional Considerations
- Implementation of Lakehouse Tables
- Traverse Parquet Files to Managed Delta Tables
- Using External Tables
- Updating Tables with MERGE Operations
- Spark Structured Streaming
- Example with Azure Event Hubs
- Using Change Data Capture
- Navigating Data Handling Techniques
- Schema Management
- Create Tables Without Defining Schemas
- Define Schemas with the DataFrame API
- SQL DDL Statements
- YAML or JSON Configurations
- Metadata-Driven Approach
- Databricks Auto Loader
- Third-Party Tools
- Handling Schema Evolution
- Conclusion
- Building the Data Pipeline
- 6. Build the Silver Layer
- Quick Recap
- Implementation of a Metadata-Driven Approach
- Implementation of the Metadata Store
- Implementation of Dynamic Data Validations
- Improvement Areas
- Data Cleansing
- Implementation of Data Cleansing Tasks
- Data Cleansing Considerations
- Data Transformation Frameworks and Data Quality Tools
- Optimization of Query Performance with Denormalization
- Lightweight Enrichments
- Data Historization
- Optimization Jobs
- Orchestration with Apache AirFlow
- Final Recommendations
- Silver-Layer Data as a Product
- Conclusion
- 7. Streamline the Gold Layer
- Design of the Gold Layer
- Transform Data Using a Star Schema
- Creation of the Gold-layer tables
- Creation of the dimensional table for address
- Creation of the dimensional table for customer
- Creation of the dimensional table for date
- Creation of the dimensional table for product
- Creation of the fact table for sales
- Creation of the Semantic Model
- Creation of the First Power BI Report
- Creation of Task Flows
- Enhancements for Gold-Layer Design
- Microsoft Fabric in Practice
- Transform Data Using a Star Schema
- Data Products
- Introduction to data product guidelines
- Types of data products
- Data modeling guidance
- Governance guidance
- Data Governance with Microsoft Purview
- Microsoft Purview Design Considerations
- Governance domains
- Collections
- Microsoft Purview data products
- Guidance for Medallion Architectures
- Microsoft Purview Design Considerations
- Conclusion
- Design of the Gold Layer
- III. Real-World Case Studies
- 8. Case Study: Data, Analytics and Business Strategy at AP Pension
- Medallion Architecture
- Other Considerations
- Final Recommendations
- 9. Case Study: Amadeus, a Tech Leader in the Travel Industry
- Medallion Architecture
- FinOps
- Data Models
- Data Contracts
- Data Governance
- 10. Case Study: Strategic Data Transformation at ZEISS
- Data Platform Evolution
- Medallion Architecture
- Data Products and Sharing
- Recommendations and Best Practices
- IV. Scaling, Governance, and the Future of Medallion Architectures
- 11. Scaling the Medallion Architecture
- Decentralization of Data Management
- Flexibility in Federation
- Medallion Mesh
- Number of Medallion Architectures
- Medallion Inner Architecture Variations
- Separate Data Product Layers
- Tailored Medallions Architectures
- Adaptability of the Bronze Layer
- Silver Layer Variations
- Gold Layer Variations
- Enterprise Data Models
- Master Data Management
- Reference Data Management
- Conclusion
- Decentralization of Data Management
- 12. Medallion Governance and Security
- Data Governance
- Governance Within a Medallion Architecture
- Unity Catalog
- Medallion Architecture with Unity Catalog
- Data Contracts
- Contracts Within a Catalog
- Contracts Within a Metastore
- Data Contracts Using YAML Files and GitOps
- Other Data Contract Specifications
- Data Security and Access Management
- Conclusion
- Data Governance
- 13. Future Medallion Architectures with Generative AI
- Unstructured Data Processing
- Retrieval-Augmented Generation
- Bronze Layer
- Silver Layer
- Gold Layer
- Integration of LLMs and Medallion Architectures
- Role of Agents
- Training and Fine-Tuning LLMs
- Future of Medallion Architectures
- Conclusion
- Unstructured Data Processing
- Index