Mastering Spark with R. The Complete Guide to Large-Scale Analysis and Modeling - Helion
ISBN: 978-14-920-4632-5
stron: 296, Format: ebook
Data wydania: 2019-10-07
Księgarnia: Helion
Cena książki: 160,65 zł (poprzednio: 186,80 zł)
Oszczędzasz: 14% (-26,15 zł)
If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.
Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.
- Analyze, explore, transform, and visualize data in Apache Spark with R
- Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows
- Perform analysis and modeling across many machines using distributed computing techniques
- Use large-scale data from multiple sources and different formats with ease from within Spark
- Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale
- Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions
Osoby które kupowały "Mastering Spark with R. The Complete Guide to Large-Scale Analysis and Modeling", wybierały także:
- R i pakiet shiny. Kurs video. Interaktywne aplikacje w analizie danych 149,00 zł, (74,50 zł -50%)
- Zaawansowana analiza danych. Jak przej 59,90 zł, (29,95 zł -50%)
- Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python. Wydanie II 69,00 zł, (34,50 zł -50%)
- Język R. Receptury. Analiza danych, statystyka i przetwarzanie grafiki. Wydanie II 89,00 zł, (44,50 zł -50%)
- Deep Learning. Praca z językiem R i biblioteką Keras 77,00 zł, (38,50 zł -50%)
Spis treści
Mastering Spark with R. The Complete Guide to Large-Scale Analysis and Modeling eBook -- spis treści
- Foreword
- Preface
- Formatting
- Acknowledgments
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- 1. Introduction
- Overview
- Hadoop
- Spark
- R
- sparklyr
- Recap
- 2. Getting Started
- Overview
- Prerequisites
- Installing sparklyr
- Installing Spark
- Connecting
- Using Spark
- Web Interface
- Analysis
- Modeling
- Data
- Extensions
- Distributed R
- Streaming
- Logs
- Disconnecting
- Using RStudio
- Resources
- Recap
- 3. Analysis
- Overview
- Import
- Wrangle
- Built-in Functions
- Correlations
- Visualize
- Using ggplot2
- Using dbplot
- Model
- Caching
- Communicate
- Recap
- 4. Modeling
- Overview
- Exploratory Data Analysis
- Feature Engineering
- Supervised Learning
- Generalized Linear Regression
- Other Models
- Unsupervised Learning
- Data Preparation
- Topic Modeling
- Recap
- 5. Pipelines
- Overview
- Creation
- Use Cases
- Hyperparameter Tuning
- Operating Modes
- Interoperability
- Deployment
- Batch Scoring
- Real-Time Scoring
- Recap
- 6. Clusters
- Overview
- On-Premises
- Managers
- Standalone
- YARN
- Apache Mesos
- Distributions
- Managers
- Cloud
- Amazon
- Databricks
- IBM
- Microsoft
- Qubole
- Kubernetes
- Tools
- RStudio
- Jupyter
- Livy
- Recap
- 7. Connections
- Overview
- Edge Nodes
- Spark Home
- Local
- Standalone
- YARN
- YARN Client
- YARN Cluster
- Livy
- Mesos
- Kubernetes
- Cloud
- Batches
- Tools
- Multiple Connections
- Troubleshooting
- Logging
- Spark Submit
- Detailed troubleshooting
- Windows
- Recap
- Overview
- 8. Data
- Overview
- Reading Data
- Paths
- Schema
- Memory
- Columns
- Writing Data
- Copying Data
- File Formats
- CSV
- JSON
- Parquet
- Others
- File Systems
- Storage Systems
- Hive
- Cassandra
- JDBC
- Recap
- 9. Tuning
- Overview
- Graph
- Timeline
- Configuring
- Connect Settings
- Submit Settings
- Runtime Settings
- sparklyr Settings
- Partitioning
- Implicit Partitions
- Explicit Partitions
- Caching
- Checkpointing
- Memory
- Shuffling
- Serialization
- Configuration Files
- Recap
- Overview
- 10. Extensions
- Overview
- H2O
- Graphs
- XGBoost
- Deep Learning
- Genomics
- Spatial
- Troubleshooting
- Recap
- 11. Distributed R
- Overview
- Use Cases
- Custom Parsers
- Partitioned Modeling
- Grid Search
- Web APIs
- Simulations
- Partitions
- Grouping
- Columns
- Context
- Functions
- Packages
- Cluster Requirements
- Installing R
- Apache Arrow
- Troubleshooting
- Worker Logs
- Resolving Timeouts
- Inspecting Partitions
- Debugging Workers
- Recap
- 12. Streaming
- Overview
- Transformations
- Analysis
- Modeling
- Pipelines
- Distributed R
- Kafka
- Shiny
- Recap
- 13. Contributing
- Overview
- The Spark API
- Spark Extensions
- Using Scala Code
- Recap
- A. Supplemental Code References
- Preface
- Formatting
- Chapter 1
- The Worlds Capacity to Store Information
- Daily Downloads of CRAN Packages
- Chapter 2
- Prerequisites
- Installing R
- Installing Java
- Installing RStudio
- Using RStudio
- Prerequisites
- Chapter 3
- Hive Functions
- Chapter 4
- MLlib Functions
- Classification
- Regression
- Clustering
- Recommendation
- Frequent Pattern Mining
- Feature Transformers
- MLlib Functions
- Chapter 6
- Google Trends for On-Premises (Mainframes), Cloud Computing, and Kubernetes
- Chapter 12
- Stream Generator
- Installing Kafka
- Preface
- Index