Learning and Operating Presto - Helion
ISBN: 9781098141813
stron: 194, Format: ebook
Data wydania: 2023-09-20
Księgarnia: Helion
Cena książki: 211,65 zł (poprzednio: 246,10 zł)
Oszczędzasz: 14% (-34,45 zł)
The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside.
Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production.
With this book, you will:
- Learn how to install and configure Presto
- Use Presto with business intelligence tools
- Understand how to connect Presto to a variety of data sources
- Extend Presto for real-time business insight
- Learn how to apply best practices and tuning
- Get troubleshooting tips for logs, error messages, and more
- Explore Presto's architectural concepts and usage patterns
- Understand Presto security and administration
Osoby które kupowały "Learning and Operating Presto", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Learning and Operating Presto eBook -- spis treści
- Preface
- Why We Wrote This Book
- Who This Book Is For
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- Angelica Lo Duca
- Tim Meehan
- Vivek Bharathan
- Ying Su
- 1. Introduction to Presto
- Data Warehouses and Data Lakes
- The Role of Presto in a Data Lake
- Presto Origins and Design Considerations
- High Performance
- High Scalability
- Compliance with the ANSI SQL Standard
- Federation of Data Sources
- Running in the Cloud
- Presto Architecture and Core Components
- Alternatives to Presto
- Apache Impala
- Apache Hive
- Spark SQL
- Trino
- Presto Use Cases
- Reporting and Dashboarding
- Ad Hoc Querying
- ETL Using SQL
- Data Lakehouse
- Real-Time Analytics with Real-Time Databases
- Introducing Our Case Study
- Conclusion
- 2. Getting Started with Presto
- Presto Manual Installation
- Running Presto on Docker
- Installing Docker
- Presto Docker Image
- Dockerfile
- The etc/ directory
- node.properties
- jvm.config
- config.properties
- log.properties
- catalog/<connector>.properties
- Building and Running Presto on Docker
- The Presto Sandbox
- Deploying Presto on Kubernetes
- Introducing Kubernetes
- Configuring Presto on Kubernetes
- presto-coordinator.yaml
- presto-workers.yaml
- presto-config-map.yaml
- presto-secrets.yaml
- Adding a New Catalog
- Running the Deployment on Kubernetes
- Querying Your Presto Instance
- Listing Catalogs
- Listing Schemas
- Listing Tables
- Querying a Table
- Conclusion
- 3. Connectors
- Service Provider Interface
- Connector Architecture
- Popular Connectors
- Thrift
- Writing a Custom Connector
- Prerequisites
- Plugin and Module
- ExamplePlugin
- ExampleConnectorFactory
- ExampleModule
- ExampleConnector
- ExampleHandleResolver
- Configuration
- ExampleConfig
- SessionProperties
- TableProperties
- Metadata
- Data model
- Handles
- ExampleMetadata
- ExampleClient
- Input/Output
- ExampleSplitManager
- ExampleSplit
- ExampleRecordSetProvider and ExampleRecordSet
- ExampleRecordCursor
- Deploying Your Connector
- Apache Pinot
- Setting Up and Configuring Presto
- Setting up Pinot
- Configuring Pinot
- Configuring Presto with Pinot
- Presto-Pinot Querying in Action
- Setting Up and Configuring Presto
- Conclusion
- 4. Client Connectivity
- Setting Up the Environment
- Presto Client
- Docker Image
- Kubernetes Node
- Connectivity to Presto
- REST API
- Python
- R
- JDBC
- Node.js
- ODBC
- Other Presto Client Libraries
- Building a Client Dashboard in Python
- Setting Up the Client
- Building the Dashboard
- Connecting to and querying Presto
- Preparing the results of the query
- Building the first graph
- Building the second graph
- Conclusion
- Setting Up the Environment
- 5. Open Data Lakehouse Analytics
- The Emergence of the Lakehouse
- Data Lakehouse Architecture
- Data Lake
- File Store
- File Format
- Table Format
- Query Engine
- Metadata Management
- Data Governance
- Data Access Control
- Building a Data Lakehouse
- Configuring MinIO
- Populating MinIO
- Configuring HMS
- Configuring Spark
- Registering Hudi Tables with HMS
- Connecting and Querying Presto
- Configuring MinIO
- Conclusion
- 6. Presto Administration
- Introducing Presto Administration
- Configuration
- Properties
- How to configure a cluster
- Sessions
- Using sessions
- JVM
- Memory
- Out-of-memory errors
- Garbage collection
- Properties
- Monitoring
- Console
- Using the console for monitoring
- Using the console for debugging
- Using the console for going over the interactive plan
- REST API
- Metrics
- JMX connector
- REST API
- JMX exporters
- Console
- Management
- Resource Groups
- Configuring resource groups
- Resource groups properties
- Example
- Verifiers
- Setting up the system
- Configuring the MySQL database
- Configuring the Presto verifier
- Running a test
- Session Properties Managers
- Configuring a session property manager
- Namespace Functions
- Setting up the system
- Configuring a function
- Running a test
- Resource Groups
- Conclusion
- 7. Understanding Security in Presto
- Introducing Presto Security
- Building Secure Communication in Presto
- Encryption
- Keystore Management
- Configuring HTTPS/TLS
- Running a Presto client
- Running the Presto console
- Authentication
- File-Based Authentication
- Running a Presto client
- Running the Presto console
- LDAP
- Kerberos
- Prerequisites
- Configuring the Presto coordinator and workers
- Configuring the Presto client
- Creating a Custom Authenticator
- File-Based Authentication
- Authorization
- Authorizing Access to the Presto REST API
- Configuring System Access Control
- Authorization Through Apache Ranger
- Building a custom audit function
- Conclusion
- 8. Performance Tuning
- Introducing Performance Tuning
- Reasons for Performance Tuning
- The Performance Tuning Life Cycle
- Query Execution Model
- Approaches for Performance Tuning in Presto
- Resource Allocation
- Storage
- Query Optimization
- Aria Scan
- Table Scanning
- Repartitioning
- Implementing Performance Tuning
- Building and Importing the Sample CSV Table in MinIO
- Converting the CSV Table in ORC
- Defining the Tuning Parameters
- Running Tests
- Default parameters
- Reducing CPU usage
- Query optimization
- Aria scan
- Conclusion
- Introducing Performance Tuning
- 9. Operating Presto at Scale
- Introducing Scalability
- Reasons to Scale Presto
- Common Issues
- Design Considerations
- Availability
- Manageability
- Performance
- Protection
- Configuration
- How to Scale Presto
- Multiple Coordinators
- Presto on Spark
- Spilling
- Using a Cloud Service
- Conclusion
- Introducing Scalability
- Index