Hands-On Unsupervised Learning Using Python. How to Build Applied Machine Learning Solutions from Unlabeled Data - Helion

ebook

Autor: Ankur A. Patel
ISBN: 9781492035596
stron: 360, Format: ebook
Data wydania: 2018-07-31
Księgarnia: Helion

Cena książki: 211,65 zł (poprzednio: 246,10 zł)
Oszczędzasz: 14% (-34,45 zł)

Osoby, które kupiły tę książkę, wybierały także »

Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to general artificial intelligence. Since the majority of the world's data is unlabeled, conventional supervised learning cannot be applied. Unsupervised learning, on the other hand, can be applied to unlabeled datasets to discover meaningful patterns buried deep in the data, patterns that may be near impossible for humans to uncover.

Author Ankur Patel shows you how to apply unsupervised learning using two simple production-ready Python frameworks: scikit-learn and TensorFlow using Keras. With code and hands-on examples, data scientists will identify difficult-to-find patterns in data and gain deeper business insight, detect anomalies, perform automatic feature engineering and selection, and generate synthetic datasets. All you need is programming and some machine learning experience to get started.

Compare the strengths and weaknesses of the different machine learning approaches: supervised, unsupervised, and reinforcement learning
Set up and manage machine learning projects end-to-end
Build an anomaly detection system to catch credit card fraud
Clusters users into distinct and homogeneous groups
Perform semisupervised learning
Develop movie recommender systems using restricted Boltzmann machines
Generate synthetic images using generative adversarial networks

Osoby które kupowały "Hands-On Unsupervised Learning Using Python. How to Build Applied Machine Learning Solutions from Unlabeled Data", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)

Spis treści

Hands-On Unsupervised Learning Using Python. How to Build Applied Machine Learning Solutions from Unlabeled Data eBook -- spis treści

Preface
- A Brief History of Machine Learning
- AI Is Back, but Why Now?
- The Emergence of Applied AI
- Major Milestones in Applied AI over the Past 20 Years
- From Narrow AI to AGI
- Objective and Approach
- Prerequisites
- Roadmap
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
I. Fundamentals of Unsupervised Learning
1. Unsupervised Learning in the Machine Learning Ecosystem
- Basic Machine Learning Terminology
- Rules-Based vs. Machine Learning
- Supervised vs. Unsupervised
  - The Strengths and Weaknesses of Supervised Learning
  - The Strengths and Weaknesses of Unsupervised Learning
- Using Unsupervised Learning to Improve Machine Learning Solutions
  - Insufficient labeled data
  - Overfitting
  - Curse of dimensionality
  - Feature engineering
  - Outliers
  - Data drift
- A Closer Look at Supervised Algorithms
  - Linear Methods
    - Linear regression
    - Logistic regression
  - Neighborhood-Based Methods
    - k-nearest neighbors
  - Tree-Based Methods
    - Single decision tree
    - Bagging
    - Random forests
    - Boosting
  - Support Vector Machines
  - Neural Networks
- A Closer Look at Unsupervised Algorithms
  - Dimensionality Reduction
    - Linear projection
      - Principal component analysis (PCA)
      - Singular value decomposition (SVD)
      - Random projection
    - Manifold learning
      - Isomap
      - t-distributed stochastic neighbor embedding (t-SNE)
      - Dictionary learning
    - Independent component analysis
    - Latent Dirichlet allocation
  - Clustering
    - k-means
    - Hierarchical clustering
    - DBSCAN
  - Feature Extraction
    - Autoencoders
    - Feature extraction using supervised training of feedforward networks
  - Unsupervised Deep Learning
    - Unsupervised pretraining
    - Restricted Boltzmann machines
    - Deep belief networks
    - Generative adversarial networks
  - Sequential Data Problems Using Unsupervised Learning
- Reinforcement Learning Using Unsupervised Learning
- Semisupervised Learning
- Successful Applications of Unsupervised Learning
  - Anomaly Detection
    - Group segmentation
- Conclusion
2. End-to-End Machine Learning Project
- Environment Setup
  - Version Control: Git
  - Clone the Hands-On Unsupervised Learning Git Repository
  - Scientific Libraries: Anaconda Distribution of Python
  - Neural Networks: TensorFlow and Keras
  - Gradient Boosting, Version One: XGBoost
  - Gradient Boosting, Version Two: LightGBM
  - Clustering Algorithms
  - Interactive Computing Environment: Jupyter Notebook
- Overview of the Data
- Data Preparation
  - Data Acquisition
    - Download the data
    - Import the necessary libraries
    - Read the data
    - Preview the data
  - Data Exploration
    - Generate summary statistics
    - Identify nonnumerical values by feature
    - Identify distinct values by feature
  - Generate Feature Matrix and Labels Array
    - Create the feature matrix X and the labels array Y
    - Standardize the feature matrix X
  - Feature Engineering and Feature Selection
    - Check correlation of features
  - Data Visualization
- Model Preparation
  - Split into Training and Test Sets
  - Select Cost Function
  - Create k-Fold Cross-Validation Sets
- Machine Learning Models (Part I)
  - Model #1: Logistic Regression
    - Set hyperparameters
    - Train the model
    - Evaluate the results
- Evaluation Metrics
  - Confusion Matrix
  - Precision-Recall Curve
  - Receiver Operating Characteristic
    - Evaluating the logistic regression model
- Machine Learning Models (Part II)
  - Model #2: Random Forests
    - Set the hyperparameters
    - Train the model
    - Evaluate the results
  - Model #3: Gradient Boosting Machine (XGBoost)
    - Set the hyperparameters
    - Train the model
    - Evaluate the results
  - Model #4: Gradient Boosting Machine (LightGBM)
    - Set the hyperparameters
    - Train the model
    - Evaluate the results
- Evaluation of the Four Models Using the Test Set
  - Logistic regression
  - Random forests
  - XGBoost gradient boosting
  - LightGBM gradient boosting
- Ensembles
  - Stacking
    - Combine layer one predictions with the original training dataset
    - Set the hyperparameters
    - Train the model
    - Evaluate the results
- Final Model Selection
- Production Pipeline
- Conclusion
II. Unsupervised Learning Using Scikit-Learn
3. Dimensionality Reduction
- The Motivation for Dimensionality Reduction
  - The MNIST Digits Database
    - Data acquisition and exploration
    - Load the MNIST datasets
    - Verify shape of datasets
    - Create Pandas DataFrames from the datasets
    - Explore the data
    - Display the images
- Dimensionality Reduction Algorithms
  - Linear Projection vs. Manifold Learning
- Principal Component Analysis
  - PCA, the Concept
  - PCA in Practice
    - Set the hyperparameters
    - Apply PCA
    - Evaluate PCA
    - Visualize the separation of points in space
  - Incremental PCA
  - Sparse PCA
  - Kernel PCA
- Singular Value Decomposition
- Random Projection
  - Gaussian Random Projection
  - Sparse Random Projection
- Isomap
- Multidimensional Scaling
- Locally Linear Embedding
- t-Distributed Stochastic Neighbor Embedding
- Other Dimensionality Reduction Methods
- Dictionary Learning
- Independent Component Analysis
- Conclusion
4. Anomaly Detection
- Credit Card Fraud Detection
  - Prepare the Data
  - Define Anomaly Score Function
  - Define Evaluation Metrics
  - Define Plotting Function
- Normal PCA Anomaly Detection
  - PCA Components Equal Number of Original Dimensions
  - Search for the Optimal Number of Principal Components
- Sparse PCA Anomaly Detection
- Kernel PCA Anomaly Detection
- Gaussian Random Projection Anomaly Detection
- Sparse Random Projection Anomaly Detection
- Nonlinear Anomaly Detection
- Dictionary Learning Anomaly Detection
- ICA Anomaly Detection
- Fraud Detection on the Test Set
  - Normal PCA Anomaly Detection on the Test Set
  - ICA Anomaly Detection on the Test Set
  - Dictionary Learning Anomaly Detection on the Test Set
- Conclusion
5. Clustering
- MNIST Digits Dataset
  - Data Preparation
- Clustering Algorithms
- k-Means
  - k-Means Inertia
  - Evaluating the Clustering Results
  - k-Means Accuracy
  - k-Means and the Number of Principal Components
  - k-Means on the Original Dataset
- Hierarchical Clustering
  - Agglomerative Hierarchical Clustering
  - The Dendrogram
  - Evaluating the Clustering Results
- DBSCAN
  - DBSCAN Algorithm
  - Applying DBSCAN to Our Dataset
  - HDBSCAN
- Conclusion
6. Group Segmentation
- Lending Club Data
  - Data Preparation
    - Load libraries
    - Explore the data
  - Transform String Format to Numerical Format
  - Impute Missing Values
  - Engineer Features
  - Select Final Set of Features and Perform Scaling
  - Designate Labels for Evaluation
- Goodness of the Clusters
- k-Means Application
- Hierarchical Clustering Application
- HDBSCAN Application
- Conclusion
III. Unsupervised Learning Using TensorFlow and Keras
7. Autoencoders
- Neural Networks
  - TensorFlow
    - TensorFlow example
  - Keras
- Autoencoder: The Encoder and the Decoder
- Undercomplete Autoencoders
- Overcomplete Autoencoders
- Dense vs. Sparse Autoencoders
- Denoising Autoencoder
- Variational Autoencoder
- Conclusion
8. Hands-On Autoencoder
- Data Preparation
- The Components of an Autoencoder
- Activation Functions
- Our First Autoencoder
  - Loss Function
  - Optimizer
  - Training the Model
  - Evaluating on the Test Set
- Two-Layer Undercomplete Autoencoder with Linear Activation Function
  - Increasing the Number of Nodes
  - Adding More Hidden Layers
- Nonlinear Autoencoder
- Overcomplete Autoencoder with Linear Activation
- Overcomplete Autoencoder with Linear Activation and Dropout
- Sparse Overcomplete Autoencoder with Linear Activation
- Sparse Overcomplete Autoencoder with Linear Activation and Dropout
- Working with Noisy Datasets
- Denoising Autoencoder
  - Two-Layer Denoising Undercomplete Autoencoder with Linear Activation
  - Two-Layer Denoising Overcomplete Autoencoder with Linear Activation
  - Two-Layer Denoising Overcomplete Autoencoder with ReLu Activation
- Conclusion
9. Semisupervised Learning
- Data Preparation
- Supervised Model
- Unsupervised Model
- Semisupervised Model
- The Power of Supervised and Unsupervised
- Conclusion
IV. Deep Unsupervised Learning Using TensorFlow and Keras
10. Recommender Systems Using Restricted Boltzmann Machines
- Boltzmann Machines
  - Restricted Boltzmann Machines
- Recommender Systems
  - Collaborative Filtering
  - The Netflix Prize
- MovieLens Dataset
  - Data Preparation
  - Define the Cost Function: Mean Squared Error
  - Perform Baseline Experiments
- Matrix Factorization
  - One Latent Factor
  - Three Latent Factors
  - Five Latent Factors
- Collaborative Filtering Using RBMs
  - RBM Neural Network Architecture
  - Build the Components of the RBM Class
  - Train RBM Recommender System
- Conclusion
11. Feature Detection Using Deep Belief Networks
- Deep Belief Networks in Detail
- MNIST Image Classification
- Restricted Boltzmann Machines
  - Build the Components of the RBM Class
  - Generate Images Using the RBM Model
  - View the Intermediate Feature Detectors
- Train the Three RBMs for the DBN
  - Examine Feature Detectors
  - View Generated Images
- The Full DBN
  - How Training of a DBN Works
  - Train the DBN
- How Unsupervised Learning Helps Supervised Learning
  - Generate Images to Build a Better Image Classifier
- Image Classifier Using LightGBM
  - Supervised Only
  - Unsupervised and Supervised Solution
- Conclusion
12. Generative Adversarial Networks
- GANs, the Concept
  - The Power of GANs
- Deep Convolutional GANs
- Convolutional Neural Networks
- DCGANs Revisited
  - Generator of the DCGAN
  - Discriminator of the DCGAN
  - Discriminator and Adversarial Models
  - DCGAN for the MNIST Dataset
- MNIST DCGAN in Action
  - Synthetic Image Generation
- Conclusion
13. Time Series Clustering
- ECG Data
- Approach to Time Series Clustering
  - k-Shape
- Time Series Clustering Using k-Shape on ECGFiveDays
  - Data Preparation
  - Training and Evaluation
- Time Series Clustering Using k-Shape on ECG5000
  - Data Preparation
  - Training and Evaluation
- Time Series Clustering Using k-Means on ECG5000
- Time Series Clustering Using Hierarchical DBSCAN on ECG5000
- Comparing the Time Series Clustering Algorithms
  - Full Run with k-Shape
  - Full Run with k-Means
  - Full Run with HDBSCAN
  - Comparing All Three Time Series Clustering Approaches
- Conclusion
14. Conclusion
- Supervised Learning
- Unsupervised Learning
  - Scikit-Learn
  - TensorFlow and Keras
- Reinforcement Learning
- Most Promising Areas of Unsupervised Learning Today
- The Future of Unsupervised Learning
- Final Words
Index