reklama - zainteresowany?

Advanced Analytics with PySpark - Helion

Advanced Analytics with PySpark
ebook
Autor: Akash Tandon, Sandy Ryza, Uri Laserson
ISBN: 9781098103606
stron: 236, Format: ebook
Data wydania: 2022-06-14
Księgarnia: Helion

Cena książki: 186,15 zł (poprzednio: 216,45 zł)
Oszczędzasz: 14% (-30,30 zł)

Dodaj do koszyka Advanced Analytics with PySpark

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming.

Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing.

If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis.

  • Familiarize yourself with Spark's programming model and ecosystem
  • Learn general approaches in data science
  • Examine complete implementations that analyze large public datasets
  • Discover which machine learning tools make sense for particular problems
  • Explore code that can be adapted to many uses

Dodaj do koszyka Advanced Analytics with PySpark

 

Osoby które kupowały "Advanced Analytics with PySpark", wybierały także:

  • Windows Media Center. Domowe centrum rozrywki
  • Ruby on Rails. Ćwiczenia
  • DevOps w praktyce. Kurs video. Jenkins, Ansible, Terraform i Docker
  • Przywództwo w Å›wiecie VUCA. Jak być skutecznym liderem w niepewnym Å›rodowisku
  • Scrum. O zwinnym zarzÄ…dzaniu projektami. Wydanie II rozszerzone

Dodaj do koszyka Advanced Analytics with PySpark

Spis treści

Advanced Analytics with PySpark eBook -- spis treści

  • Preface
    • Why Did We Write This Book Now?
    • How This Book Is Organized
    • Conventions Used in This Book
    • Using Code Examples
    • OReilly Online Learning
    • How to Contact Us
    • Acknowledgments
  • 1. Analyzing Big Data
    • Working with Big Data
    • Introducing Apache Spark and PySpark
      • Components
      • PySpark
      • Ecosystem
    • Spark 3.0
    • PySpark Addresses Challenges of Data Science
    • Where to Go from Here
  • 2. Introduction to Data Analysis with PySpark
    • Spark Architecture
    • Installing PySpark
    • Setting Up Our Data
    • Analyzing Data with the DataFrame API
    • Fast Summary Statistics for DataFrames
    • Pivoting and Reshaping DataFrames
    • Joining DataFrames and Selecting Features
    • Scoring and Model Evaluation
    • Where to Go from Here
  • 3. Recommending Music and the Audioscrobbler Dataset
    • Setting Up the Data
    • Our Requirements for a Recommender System
      • Alternating Least Squares Algorithm
    • Preparing the Data
    • Building a First Model
    • Spot Checking Recommendations
    • Evaluating Recommendation Quality
    • Computing AUC
    • Hyperparameter Selection
    • Making Recommendations
    • Where to Go from Here
  • 4. Making Predictions with Decision Trees and Decision Forests
    • Decision Trees and Forests
    • Preparing the Data
    • Our First Decision Tree
    • Decision Tree Hyperparameters
    • Tuning Decision Trees
    • Categorical Features Revisited
    • Random Forests
    • Making Predictions
    • Where to Go from Here
  • 5. Anomaly Detection with K-means Clustering
    • K-means Clustering
    • Identifying Anomalous Network Traffic
      • KDD Cup 1999 Dataset
    • A First Take on Clustering
    • Choosing k
    • Visualization with SparkR
    • Feature Normalization
    • Categorical Variables
    • Using Labels with Entropy
    • Clustering in Action
    • Where to Go from Here
  • 6. Understanding Wikipedia with LDA and Spark NLP
    • Latent Dirichlet Allocation
      • LDA in PySpark
    • Getting the Data
    • Spark NLP
      • Setting Up Your Environment
    • Parsing the Data
    • Preparing the Data Using Spark NLP
    • TF-IDF
    • Computing the TF-IDFs
    • Creating Our LDA Model
    • Where to Go from Here
  • 7. Geospatial and Temporal Data Analysis on Taxi Trip Data
    • Preparing the Data
      • Converting Datetime Strings to Timestamps
      • Handling Invalid Records
    • Geospatial Analysis
      • Intro to GeoJSON
      • GeoPandas
    • Sessionization in PySpark
      • Building Sessions: Secondary Sorts in PySpark
    • Where to Go from Here
  • 8. Estimating Financial Risk
    • Terminology
    • Methods for Calculating VaR
      • Variance-Covariance
      • Historical Simulation
      • Monte Carlo Simulation
    • Our Model
    • Getting the Data
    • Preparing the Data
    • Determining the Factor Weights
    • Sampling
      • The Multivariate Normal Distribution
    • Running the Trials
    • Visualizing the Distribution of Returns
    • Where to Go from Here
  • 9. Analyzing Genomics Data and the BDG Project
    • Decoupling Storage from Modeling
    • Setting Up ADAM
    • Introduction to Working with Genomics Data Using ADAM
      • File Format Conversion with the ADAM CLI
      • Ingesting Genomics Data Using PySpark and ADAM
    • Predicting Transcription Factor Binding Sites from ENCODE Data
    • Where to Go from Here
  • 10. Image Similarity Detection with Deep Learning and PySpark LSH
    • PyTorch
      • Installation
    • Preparing the Data
      • Resizing Images Using PyTorch
    • Deep Learning Model for Vector Representation of Images
      • Image Embeddings
      • Import Image Embeddings into PySpark
    • Image Similarity Search Using PySpark LSH
      • Nearest Neighbor Search
    • Where to Go from Here
  • 11. Managing the Machine Learning Lifecycle with MLflow
    • Machine Learning Lifecycle
    • MLflow
    • Experiment Tracking
    • Managing and Serving ML Models
    • Creating and Using MLflow Projects
    • Where to Go from Here
  • Index

Dodaj do koszyka Advanced Analytics with PySpark

Code, Publish & WebDesing by CATALIST.com.pl



(c) 2005-2024 CATALIST agencja interaktywna, znaki firmowe należą do wydawnictwa Helion S.A.