Building Real-Time Analytics Systems - Helion
ISBN: 9781098138752
stron: 220, Format: ebook
Data wydania: 2023-09-14
Księgarnia: Helion
Cena książki: 211,65 zł (poprzednio: 246,10 zł)
Oszczędzasz: 14% (-34,45 zł)
Gain deep insight into real-time analytics, including the features of these systems and the problems they solve. With this practical book, data engineers at organizations that use event-processing systems such as Kafka, Google Pub/Sub, and AWS Kinesis will learn how to analyze data streams in real time. The faster you derive insights, the quicker you can spot changes in your business and act accordingly.
Author Mark Needham from StarTree provides an overview of the real-time analytics space and an understanding of what goes into building real-time applications. The book's second part offers a series of hands-on tutorials that show you how to combine multiple software products to build real-time analytics applications for an imaginary pizza delivery service.
You will:
- Learn common architectures for real-time analytics
- Discover how event processing differs from real-time analytics
- Ingest event data from Apache Kafka into Apache Pinot
- Combine event streams with OLTP data using Debezium and Kafka Streams
- Write real-time queries against event data stored in Apache Pinot
- Build a real-time dashboard and order tracking app
- Learn how Uber, Stripe, and Just Eat use real-time analytics
Osoby które kupowały "Building Real-Time Analytics Systems", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Building Real-Time Analytics Systems eBook -- spis treści
- Foreword
- Preface
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- 1. Introduction to Real-Time Analytics
- What Is an Event Stream?
- Making Sense of Streaming Data
- What Is Real-Time Analytics?
- Benefits of Real-Time Analytics
- New Revenue Streams
- Timely Access to Insights
- Reduced Infrastructure Cost
- Improved Overall Customer Experience
- Real-Time Analytics Use Cases
- User-Facing Analytics
- Personalization
- Metrics
- Anomaly Detection and Root Cause Analysis
- Visualization
- Ad Hoc Analytics
- Log Analytics/Text Search
- Classifying Real-Time Analytics Applications
- Internal Versus External Facing
- Machine Versus Human Facing
- Summary
- 2. The Real-Time Analytics Ecosystem
- Defining the Real-Time Analytics Ecosystem
- The Classic Streaming Stack
- Complex Event Processing
- The Big Data Era
- The Modern Streaming Stack
- Event Producers
- Streaming Data Platform
- Stream Processing Layer
- Serving Layer
- Frontend
- Summary
- 3. Introducing All About That Dough: Real-Time Analytics on Pizza
- Existing Architecture
- Setup
- MySQL
- Apache Kafka
- ZooKeeper
- Orders Service
- Spinning Up the Components
- Inspecting the Data
- Applications of Real-Time Analytics
- Summary
- 4. Querying Kafka with Kafka Streams
- What Is Kafka Streams?
- What Is Quarkus?
- Quarkus Application
- Installing the Quarkus CLI
- Creating a Quarkus Application
- Creating a Topology
- Querying the Key-Value Store
- Creating an HTTP Endpoint
- Running the Application
- Querying the HTTP Endpoint
- Limitations of Kafka Streams
- Summary
- 5. The Serving Layer: Apache Pinot
- Why Cant We Use Another Stream Processor?
- Why Cant We Use a Data Warehouse?
- What Is Apache Pinot?
- How Does Pinot Model and Store Data?
- Schema
- Table
- Setup
- Data Ingestion
- Pinot Data Explorer
- Indexes
- Updating the Web App
- Summary
- 6. Building a Real-Time Analytics Dashboard
- Dashboard Architecture
- What Is Streamlit?
- Setup
- Building the Dashboard
- Summary
- 7. Product Changes Captured with Change Data Capture
- Capturing Changes from Operational Databases
- Change Data Capture
- Why Do We Need CDC?
- What Is CDC?
- What Are the Strategies for Implementing CDC?
- Log-Based Data Capture
- Requirements for a CDC System
- Debezium
- Applying CDC to AATD
- Setup
- Connecting Debezium to MySQL
- Querying the Products Stream
- Updating Products
- Summary
- 8. Joining Streams with Kafka Streams
- Enriching Orders with Kafka Streams
- Adding Order Items to Pinot
- Updating the Orders Service
- Refreshing the Streamlit Dashboard
- Summary
- 9. Upserts in the Serving Layer
- Order Statuses
- Enriched Orders Stream
- Upserts in Apache Pinot
- Updating the Orders Service
- Creating UsersResource
- Adding an allUsers Endpoint
- Adding an Orders for User Endpoint
- Adding an Individual Order Endpoint
- Configuring Cross-Origin Resource Sharing
- Frontend App
- Order Statuses on the Dashboard
- Time Spent in Each Order Status
- Orders That Might Be Stuck
- Summary
- 10. Geospatial Querying
- Delivery Statuses
- Updating Apache Pinot
- Orders
- Delivery Statuses
- Updating the Orders Service
- Individual Orders
- Delayed Orders by Area
- Consuming the New API Endpoints
- Summary
- 11. Production Considerations
- Preproduction
- Capacity Planning
- Data Partitioning
- Throughput
- Data Retention
- Data Granularity
- Total Data Size
- Replication Factor
- Deployment Platform
- In-House Skills
- Data Privacy and Security
- Cost
- Control
- Postproduction
- Monitoring and Alerting
- Streaming data platform
- Serving layer
- Data Governance
- Monitoring and Alerting
- Summary
- Preproduction
- 12. Real-Time Analytics in the Real World
- Content Recommendation (Professional Social Network)
- The Problem
- The Solution
- Benefits
- Operational Analytics (Streaming Service)
- The Problem
- The Solution
- Benefits
- Real-Time Ad Analytics (Online Marketplace)
- The Problem
- The Solution
- Benefits
- User-Facing Analytics (Collaboration Platform)
- The Problem
- The Solution
- Benefits
- Summary
- Content Recommendation (Professional Social Network)
- 13. The Future of Real-Time Analytics
- Edge Analytics
- Compute-Storage Separation
- Data Lakehouses
- Real-Time Data Visualization
- Streaming Databases
- Streaming Data Platform as a Service
- Reverse ETL
- Summary
- Index