Trino: The Definitive Guide. 2nd Edition - Helion

ISBN: 9781098137199
stron: 322, Format: ebook
Data wydania: 2022-10-03
Księgarnia: Helion
Cena książki: 271,15 zł (poprzednio: 319,00 zł)
Oszczędzasz: 15% (-47,85 zł)
Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. In the second edition of this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's a data lake using Hive, a modern lakehouse with Iceberg or Delta Lake, a different system like Cassandra, Kafka, or SingleStore, or a relational database like PostgreSQL or Oracle.
Analysts, software engineers, and production engineers learn how to manage, use, and even develop with Trino and make it a critical part of their data platform. Authors Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization.
- Explore Trino's use cases, and learn about tools that help you connect to Trino for querying and processing huge amounts of data
- Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more
- Deploy and secure Trino at scale, monitor workloads, tune queries, and connect more applications
- Learn how other organizations apply Trino successfully
Osoby które kupowały "Trino: The Definitive Guide. 2nd Edition", wybierały także:
- Jak zhakowa 125,00 zł, (10,00 zł -92%)
- Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
Spis treści
Trino: The Definitive Guide. 2nd Edition eBook -- spis treści
- Foreword
- Preface
- Conventions Used in This Book
- Code Examples, Permissions, and Attribution
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. Getting Started with Trino
- 1. Introducing Trino
- The Problems with Big Data
- Trino to the Rescue
- Designed for Performance and Scale
- SQL-on-Anything
- Separation of Data Storage and Query Compute Resources
- Trino Use Cases
- One SQL Analytics Access Point
- Access Point to Data Warehouse and Source Systems
- Provide SQL-Based Access to Anything
- Federated Queries
- Semantic Layer for a Virtual Data Warehouse
- Data Lake Query Engine
- SQL Conversions and ETL
- Better Insights Due to Faster Response Times
- Big Data, Machine Learning, and Artificial Intelligence
- Other Use Cases
- Trino Resources
- Website
- Documentation
- Community Chat
- Source Code, License, and Version
- Contributing
- Book Repository
- Iris Data Set
- Flight Data Set
- A Brief History of Trino
- Conclusion
- 2. Installing and Configuring Trino
- Trying Trino with the Docker Container
- Installing from the Archive File
- Java Virtual Machine
- Python
- Installation
- Configuration
- Adding a Data Source
- Running Trino
- Conclusion
- 3. Using Trino
- Trino Command-Line Interface
- Getting Started
- Pagination
- History and Completion
- Additional Diagnostics
- Executing Queries
- Output Formats
- Ignoring Errors
- Trino JDBC Driver
- Downloading and Registering the Driver
- Establishing a Connection to Trino
- Trino and ODBC
- Client Libraries
- Trino Web UI
- SQL with Trino
- Concepts
- First Examples
- Conclusion
- Trino Command-Line Interface
- II. Diving Deeper into Trino
- 4. Trino Architecture
- Coordinator and Workers in a Cluster
- Coordinator
- Discovery Service
- Workers
- Connector-Based Architecture
- Catalogs, Schemas, and Tables
- Query Execution Model
- Query Planning
- Parsing and Analysis
- Initial Query Planning
- Optimization Rules
- Predicate Pushdown
- Cross Join Elimination
- TopN
- Partial Aggregations
- Implementation Rules
- Lateral Join Decorrelation
- Semi-Join (IN) Decorrelation
- Cost-Based Optimizer
- The Cost Concept
- Cost of the Join
- Table Statistics
- Filter Statistics
- Table Statistics for Partitioned Tables
- Join Enumeration
- Broadcast Versus Distributed Joins
- Broadcast join strategy
- Distributed join strategy
- Working with Table Statistics
- Trino ANALYZE
- Gathering Statistics When Writing to Disk
- Hive ANALYZE
- Displaying Table Statistics
- Conclusion
- Coordinator and Workers in a Cluster
- 5. Production-Ready Deployment
- Configuration Details
- Server Configuration
- Logging
- Node Configuration
- JVM Configuration
- Launcher
- Cluster Installation
- RPM Installation
- Installation Directory Structure
- Configuration
- Uninstall Trino
- Installation in the Cloud
- Helm Chart for Kubernetes Deployment
- Cluster Sizing Considerations
- Conclusion
- 6. Connectors
- Configuration
- RDBMS Connector Example: PostgreSQL
- Query Pushdown
- Parallelism and Concurrency
- Other RDBMS Connectors
- Security
- Query Pass-Through
- Trino TPC-H and TPC-DS Connectors
- Hive Connector for Distributed Storage Data Sources
- Apache Hadoop and Hive
- Hive Connector
- Hive-Style Table Format
- Managed and External Tables
- Partitioned Data
- Loading Data
- File Formats and Compression
- MinIO Example
- Modern Distributed Storage Management and Analytics
- Non-Relational Data Sources
- Trino JMX Connector
- Black Hole Connector
- Memory Connector
- Other Connectors
- Conclusion
- 7. Advanced Connector Examples
- Connecting to HBase with Phoenix
- Key-Value Store Connector Example: Accumulo
- Using the Trino Accumulo Connector
- Predicate Pushdown in Accumulo
- Apache Cassandra Connector
- Streaming System Connector Example: Kafka
- Document Store Connector Example: Elasticsearch
- Overview
- Configuration and Usage
- Query Processing
- Full-Text Search
- Summary
- Query Federation in Trino
- Extract, Transform, Load and Federated Queries
- Conclusion
- 8. Using SQL in Trino
- Trino Statements
- Trino System Tables
- Catalogs
- Schemas
- Information Schema
- Tables
- Table and Column Properties
- Copying an Existing Table
- Creating a New Table from Query Results
- Modifying a Table
- Deleting a Table
- Table Limitations from Connectors
- Views
- Session Information and Configuration
- Data Types
- Collection Data Types
- Temporal Data Types
- Time zones
- Intervals
- Type Casting
- SELECT Statement Basics
- WHERE Clause
- GROUP BY and HAVING Clauses
- ORDER BY and LIMIT Clauses
- JOIN Statements
- UNION, INTERSECT, and EXCEPT Clauses
- Grouping Operations
- WITH Clause
- Subqueries
- Scalar Subquery
- EXISTS Subquery
- Quantified Subquery
- Deleting Data from a Table
- Conclusion
- 9. Advanced SQL
- Functions and Operators Introduction
- Scalar Functions and Operators
- Boolean Operators
- Logical Operators
- Range Selection with the BETWEEN Statement
- Value Detection with IS (NOT) NULL
- Mathematical Functions and Operators
- Trigonometric Functions
- Constant and Random Functions
- String Functions and Operators
- Strings and Maps
- Unicode
- Regular Expressions
- Unnesting Complex Data Types
- JSON Functions
- Date and Time Functions and Operators
- Histograms
- Aggregate Functions
- Map Aggregate Functions
- Approximate Aggregate Functions
- Window Functions
- Lambda Expressions
- Geospatial Functions
- Prepared Statements
- Conclusion
- III. Trino in Real-World Uses
- 10. Security
- Authentication
- Password and LDAP Authentication
- Other Authentication Types
- Authorization
- System Access Control
- Connector Access Control
- Encryption
- Encrypting Trino Client-to-Coordinator Communication
- Creating Java Keystores and Java Truststores
- Encrypting Communication Within the Trino Cluster
- Certificate Authority Versus Self-Signed Certificates
- Certificate Authentication
- Kerberos
- Prerequisites
- Kerberos Client Authentication
- Data Source Access and Configuration for Security
- Kerberos Authentication with the Hive Connector
- Hive Metastore Service Authentication
- HDFS Authentication
- Cluster Separation
- Conclusion
- Authentication
- 11. Integrating Trino with Other Tools
- Queries, Visualizations, and More with Apache Superset
- Performance Improvements with RubiX
- Workflows with Apache Airflow
- Embedded Trino Example: Amazon Athena
- Convenient Commercial Distributions: Starburst Enterprise and Starburst Galaxy
- Other Integration Examples
- Custom Integrations
- Conclusion
- 12. Trino in Production
- Monitoring with the Trino Web UI
- Cluster-Level Details
- Query List
- Query Details View
- Overview
- Live Plan
- Stage Performance
- Splits
- JSON
- Tuning Trino SQL Queries
- Memory Management
- Task Concurrency
- Worker Scheduling
- Network Data Exchange
- Concurrency
- Buffer Sizes
- Tuning Java Virtual Machine
- Resource Groups
- Resource Group Definition
- Scheduling Policy
- Selector Rules Definition
- Conclusion
- Monitoring with the Trino Web UI
- 13. Real-World Examples
- Deployment and Runtime Platforms
- Cluster Sizing
- Hadoop/Hive Migration Use Case
- Other Data Sources
- Users and Traffic
- Conclusion
- Conclusion
- Index





