Mastering Spark with R. The Complete Guide to Large-Scale Analysis and Modeling - Helion

ebook

Autor: Javier Luraschi, Kevin Kuo, Edgar Ruiz
ISBN: 978-14-920-4632-5
stron: 296, Format: ebook
Data wydania: 2019-10-07
Księgarnia: Helion

Cena książki: 143,65 zł (poprzednio: 167,03 zł)
Oszczędzasz: 14% (-23,38 zł)

Osoby, które kupiły tę książkę, wybierały także »

Tagi: R - Programowanie

If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.

Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.

Analyze, explore, transform, and visualize data in Apache Spark with R
Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows
Perform analysis and modeling across many machines using distributed computing techniques
Use large-scale data from multiple sources and different formats with ease from within Spark
Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale
Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions

Osoby które kupowały "Mastering Spark with R. The Complete Guide to Large-Scale Analysis and Modeling", wybierały także:

Zaawansowana analiza danych. Jak przej 59,90 zł, (29,95 zł -50%)
Statystyka praktyczna w data science. 50 kluczowych zagadnień w językach R i Python. Wydanie II 69,00 zł, (34,50 zł -50%)
Język R. Receptury. Analiza danych, statystyka i przetwarzanie grafiki. Wydanie II 89,00 zł, (44,50 zł -50%)
Deep Learning. Praca z językiem R i biblioteką Keras 77,00 zł, (38,50 zł -50%)
Machine learning i język R. Kurs video. Pierwsze kroki z pakietem mlr 39,00 zł, (19,50 zł -50%)

Spis treści

Mastering Spark with R. The Complete Guide to Large-Scale Analysis and Modeling eBook -- spis treści

Foreword
Preface
- Formatting
- Acknowledgments
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
1. Introduction
- Overview
- Hadoop
- Spark
- R
- sparklyr
- Recap
2. Getting Started
- Overview
- Prerequisites
  - Installing sparklyr
  - Installing Spark
- Connecting
- Using Spark
  - Web Interface
  - Analysis
  - Modeling
  - Data
  - Extensions
  - Distributed R
  - Streaming
  - Logs
- Disconnecting
- Using RStudio
- Resources
- Recap
3. Analysis
- Overview
- Import
- Wrangle
  - Built-in Functions
  - Correlations
- Visualize
  - Using ggplot2
  - Using dbplot
- Model
  - Caching
- Communicate
- Recap
4. Modeling
- Overview
- Exploratory Data Analysis
- Feature Engineering
- Supervised Learning
  - Generalized Linear Regression
  - Other Models
- Unsupervised Learning
  - Data Preparation
  - Topic Modeling
- Recap
5. Pipelines
- Overview
- Creation
- Use Cases
  - Hyperparameter Tuning
- Operating Modes
- Interoperability
- Deployment
  - Batch Scoring
  - Real-Time Scoring
- Recap
6. Clusters
- Overview
- On-Premises
  - Managers
    - Standalone
    - YARN
    - Apache Mesos
  - Distributions
- Cloud
  - Amazon
  - Databricks
  - Google
  - IBM
  - Microsoft
  - Qubole
- Kubernetes
- Tools
  - RStudio
  - Jupyter
  - Livy
- Recap
7. Connections
- Overview
  - Edge Nodes
  - Spark Home
- Local
- Standalone
- YARN
  - YARN Client
  - YARN Cluster
- Livy
- Mesos
- Kubernetes
- Cloud
- Batches
- Tools
- Multiple Connections
- Troubleshooting
  - Logging
  - Spark Submit
    - Detailed troubleshooting
  - Windows
- Recap
8. Data
- Overview
- Reading Data
  - Paths
  - Schema
  - Memory
  - Columns
- Writing Data
- Copying Data
- File Formats
  - CSV
  - JSON
  - Parquet
  - Others
- File Systems
- Storage Systems
  - Hive
  - Cassandra
  - JDBC
- Recap
9. Tuning
- Overview
  - Graph
  - Timeline
- Configuring
  - Connect Settings
  - Submit Settings
  - Runtime Settings
  - sparklyr Settings
- Partitioning
  - Implicit Partitions
  - Explicit Partitions
- Caching
  - Checkpointing
  - Memory
- Shuffling
- Serialization
- Configuration Files
- Recap
10. Extensions
- Overview
- H2O
- Graphs
- XGBoost
- Deep Learning
- Genomics
- Spatial
- Troubleshooting
- Recap
11. Distributed R
- Overview
- Use Cases
  - Custom Parsers
  - Partitioned Modeling
  - Grid Search
  - Web APIs
  - Simulations
- Partitions
- Grouping
- Columns
- Context
- Functions
- Packages
- Cluster Requirements
  - Installing R
  - Apache Arrow
- Troubleshooting
  - Worker Logs
  - Resolving Timeouts
  - Inspecting Partitions
  - Debugging Workers
- Recap
12. Streaming
- Overview
- Transformations
  - Analysis
  - Modeling
  - Pipelines
  - Distributed R
- Kafka
- Shiny
- Recap
13. Contributing
- Overview
- The Spark API
- Spark Extensions
- Using Scala Code
- Recap
A. Supplemental Code References
- Preface
  - Formatting
- Chapter 1
  - The Worlds Capacity to Store Information
  - Daily Downloads of CRAN Packages
- Chapter 2
  - Prerequisites
    - Installing R
    - Installing Java
    - Installing RStudio
    - Using RStudio
- Chapter 3
  - Hive Functions
- Chapter 4
  - MLlib Functions
    - Classification
    - Regression
    - Clustering
    - Recommendation
    - Frequent Pattern Mining
    - Feature Transformers
- Chapter 6
  - Google Trends for On-Premises (Mainframes), Cloud Computing, and Kubernetes
- Chapter 12
  - Stream Generator
  - Installing Kafka
Index