Trino: The Definitive Guide - Helion
ISBN: 9781098107666
stron: 310, Format: ebook
Data wydania: 2021-04-14
Księgarnia: Helion
Cena książki: 211,65 zł (poprzednio: 246,10 zł)
Oszczędzasz: 14% (-34,45 zł)
Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. With this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's Hive, Cassandra, a relational database, or a proprietary data store. Analysts, software engineers, and production engineers will learn how to manage, use, and even develop with Trino.
Initially developed by Facebook, open source Trino is now used by Netflix, Airbnb, LinkedIn, Twitter, Uber, and many other companies. Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization.
- Get started: Explore Trino's use cases and learn about tools that will help you connect to Trino and query data
- Go deeper: Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more
- Put Trino in production: Secure Trino, monitor workloads, tune queries, and connect more applications; learn how other organizations apply Trino
Osoby które kupowały "Trino: The Definitive Guide", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Trino: The Definitive Guide eBook -- spis treści
- Foreword
- Preface
- About the Book
- Conventions Used in This Book
- Code Examples, Permissions, and Attribution
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
- I. Getting Started with Trino
- 1. Introducing Trino
- The Problems with Big Data
- Trino to the Rescue
- Designed for Performance and Scale
- SQL-on-Anything
- Separation of Data Storage and Query Compute Resources
- Trino Use Cases
- One SQL Analytics Access Point
- Access Point to Data Warehouse and Source Systems
- Provide SQL-Based Access to Anything
- Federated Queries
- Semantic Layer for a Virtual Data Warehouse
- Data Lake Query Engine
- SQL Conversions and ETL
- Better Insights Due to Faster Response Times
- Big Data, Machine Learning, and Artificial Intelligence
- Other Use Cases
- Trino Resources
- Website
- Documentation
- Community Chat
- Source Code, License, and Version
- Contributing
- Book Repository
- Iris Data Set
- Flight Data Set
- A Brief History of Trino
- Conclusion
- 2. Installing and Configuring Trino
- Trying Trino with the Docker Container
- Installing from Archive File
- Java Virtual Machine
- Python
- Installation
- Configuration
- Adding a Data Source
- Running Trino
- Conclusion
- 3. Using Trino
- Trino Command-Line Interface
- Getting Started
- Pagination
- History
- Additional Diagnostics
- Executing Queries
- Output Formats
- Ignoring Errors
- Trino JDBC Driver
- Downloading and Registering the Driver
- Establishing a Connection to Trino
- Trino and ODBC
- Client Libraries
- Trino Web UI
- SQL with Trino
- Concepts
- First Examples
- Conclusion
- Trino Command-Line Interface
- II. Diving Deeper into Trino
- 4. Trino Architecture
- Coordinator and Workers in a Cluster
- Coordinator
- Discovery Service
- Workers
- Connector-Based Architecture
- Catalogs, Schemas, and Tables
- Query Execution Model
- Query Planning
- Parsing and Analysis
- Initial Query Planning
- Optimization Rules
- Predicate Pushdown
- Cross Join Elimination
- TopN
- Partial Aggregations
- Implementation Rules
- Lateral Join Decorrelation
- Semi-Join (IN) Decorrelation
- Cost-Based Optimizer
- The Cost Concept
- Cost of the Join
- Table Statistics
- Filter Statistics
- Table Statistics for Partitioned Tables
- Join Enumeration
- Broadcast Versus Distributed Joins
- Broadcast join strategy
- Distributed join strategy
- Working with Table Statistics
- Trino ANALYZE
- Gathering Statistics When Writing to Disk
- Hive ANALYZE
- Displaying Table Statistics
- Conclusion
- 5. Production-Ready Deployment
- Configuration Details
- Server Configuration
- Logging
- Node Configuration
- JVM Configuration
- Launcher
- Cluster Installation
- RPM Installation
- Installation Directory Structure
- Configuration
- Uninstall Trino
- Installation in the Cloud
- Cluster Sizing Considerations
- Conclusion
- 6. Connectors
- Configuration
- RDBMS Connector Example PostgreSQL
- Query Pushdown
- Parallelism and Concurrency
- Other RDBMS Connectors
- Security
- Trino TPC-H and TPC-DS Connectors
- Hive Connector for Distributed Storage Data Sources
- Apache Hadoop and Hive
- Hive Connector
- Hive-Style Table Format
- Managed and External Tables
- Partitioned Data
- Loading Data
- File Formats and Compression
- MinIO Example
- Non-Relational Data Sources
- Trino JMX Connector
- Black Hole Connector
- Memory Connector
- Other Connectors
- Conclusion
- 7. Advanced Connector Examples
- Connecting to HBase with Phoenix
- Key-Value Store Connector Example: Accumulo
- Using the Trino Accumulo Connector
- Predicate Pushdown in Accumulo
- Apache Cassandra Connector
- Streaming System Connector Example: Kafka
- Document Store Connector Example: Elasticsearch
- Overview
- Configuration and Usage
- Query Processing
- Full-Text Search
- Summary
- Query Federation in Trino
- Extract, Transform, Load and Federated Queries
- Conclusion
- 8. Using SQL in Trino
- Trino Statements
- Trino System Tables
- Catalogs
- Schemas
- Information Schema
- Tables
- Table and Column Properties
- Copying an Existing Table
- Creating a New Table from Query Results
- Modifying a Table
- Deleting a Table
- Table Limitations from Connectors
- Views
- Session Information and Configuration
- Data Types
- Collection Data Types
- Temporal Data Types
- Time Zones
- Intervals
- Type Casting
- SELECT Statement Basics
- WHERE Clause
- GROUP BY and HAVING Clauses
- ORDER BY and LIMIT Clauses
- JOIN Statements
- UNION, INTERSECT, and EXCEPT Clauses
- Grouping Operations
- WITH Clause
- Subqueries
- Scalar Subquery
- EXISTS Subquery
- Quantified Subquery
- Deleting Data from a Table
- Conclusion
- 9. Advanced SQL
- Functions and Operators Introduction
- Scalar Functions and Operators
- Boolean Operators
- Logical Operators
- Range Selection with the BETWEEN Statement
- Value Detection with IS (NOT) NULL
- Mathematical Functions and Operators
- Trigonometric Functions
- Constant and Random Functions
- String Functions and Operators
- Strings and Maps
- Unicode
- Regular Expressions
- Unnesting Complex Data Types
- JSON Functions
- Date and Time Functions and Operators
- Histograms
- Aggregate Functions
- Map Aggregate Functions
- Approximate Aggregate Functions
- Window Functions
- Lambda Expressions
- Geospatial Functions
- Prepared Statements
- Conclusion
- III. Trino in Real-World Uses
- 10. Security
- Authentication
- Password and LDAP Authentication
- Authorization
- System Access Control
- Connector Access Control
- Encryption
- Encrypting Trino Client-to-Coordinator Communication
- Creating Java Keystores and Java Truststores
- Encrypting Communication Within the Trino Cluster
- Certificate Authority Versus Self-Signed Certificates
- Certificate Authentication
- Kerberos
- Prerequisites
- Kerberos Client Authentication
- Cluster Internal Kerberos
- Data Source Access and Configuration for Security
- Kerberos Authentication with the Hive Connector
- Hive Metastore Thrift Service Authentication
- HDFS Authentication
- Cluster Separation
- Conclusion
- Authentication
- 11. Integrating Trino with Other Tools
- Queries, Visualizations, and More with Apache Superset
- Performance Improvements with RubiX
- Workflows with Apache Airflow
- Embedded Trino Example: Amazon Athena
- Starburst Enterprise
- Other Integration Examples
- Custom Integrations
- Conclusion
- 12. Trino in Production
- Monitoring with the Trino Web UI
- Cluster-Level Details
- Query List
- Query Details View
- Overview
- Live Plan
- Stage Performance
- Splits
- JSON
- Tuning Trino SQL Queries
- Memory Management
- Task Concurrency
- Worker Scheduling
- Scheduling Splits per Task and per Node
- Local Scheduling
- Network Data Exchange
- Concurrency
- Buffer Sizes
- Tuning Java Virtual Machine
- Resource Groups
- Resource Group Definition
- Scheduling Policy
- Selector Rules Definition
- Conclusion
- Monitoring with the Trino Web UI
- 13. Real-World Examples
- Deployment and Runtime Platforms
- Cluster Sizing
- Hadoop/Hive Migration Use Case
- Other Data Sources
- Users and Traffic
- Conclusion
- 14. Conclusion
- Index