Introduction to Machine Learning with Python. A Guide for Data Scientists - Helion

ebook

Autor: Andreas C. MĂźller, Sarah Guido
ISBN: 978-14-493-6989-7
stron: 394, Format: ebook
Data wydania: 2016-09-26
Księgarnia: Helion

Cena książki: 152,15 zł (poprzednio: 176,92 zł)
Oszczędzasz: 14% (-24,77 zł)

Osoby, które kupiły tę książkę, wybierały także »

Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination.

You’ll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Müller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the math behind them. Familiarity with the NumPy and matplotlib libraries will help you get even more from this book.

With this book, you’ll learn:

Fundamental concepts and applications of machine learning
Advantages and shortcomings of widely used machine learning algorithms
How to represent data processed by machine learning, including which data aspects to focus on
Advanced methods for model evaluation and parameter tuning
The concept of pipelines for chaining models and encapsulating your workflow
Methods for working with text data, including text-specific processing techniques
Suggestions for improving your machine learning and data science skills

Osoby które kupowały "Introduction to Machine Learning with Python. A Guide for Data Scientists", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Biologika Sukcesji Pokoleniowej. Sezon 3. Konflikty na terytorium 126,36 zł, (13,90 zł -89%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Podręcznik startupu. Budowa wielkiej firmy krok po kroku 92,67 zł, (13,90 zł -85%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)

Spis treści

Introduction to Machine Learning with Python. A Guide for Data Scientists eBook -- spis treści

Preface
- Who Should Read This Book
- Why We Wrote This Book
- Navigating This Book
- Online Resources
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
  - From Andreas
  - From Sarah
1. Introduction
- Why Machine Learning?
  - Problems Machine Learning Can Solve
  - Knowing Your Task and Knowing Your Data
- Why Python?
- scikit-learn
  - Installing scikit-learn
- Essential Libraries and Tools
  - Jupyter Notebook
  - NumPy
  - SciPy
  - matplotlib
  - pandas
  - mglearn
- Python 2 Versus Python 3
- Versions Used in this Book
- A First Application: Classifying Iris Species
  - Meet the Data
  - Measuring Success: Training and Testing Data
  - First Things First: Look at Your Data
  - Building Your First Model: k-Nearest Neighbors
  - Making Predictions
  - Evaluating the Model
- Summary and Outlook
2. Supervised Learning
- Classification and Regression
- Generalization, Overfitting, and Underfitting
  - Relation of Model Complexity to Dataset Size
- Supervised Machine Learning Algorithms
  - Some Sample Datasets
  - k-Nearest Neighbors
    - k-Neighbors classification
    - Analyzing KNeighborsClassifier
    - k-neighbors regression
    - Analyzing KNeighborsRegressor
    - Strengths, weaknesses, and parameters
  - Linear Models
    - Linear models for regression
    - Linear regression (aka ordinary least squares)
    - Ridge regression
    - Lasso
    - Linear models for classification
    - Linear models for multiclass classification
    - Strengths, weaknesses, and parameters
  - Naive Bayes Classifiers
    - Strengths, weaknesses, and parameters
  - Decision Trees
    - Building decision trees
    - Controlling complexity of decision trees
    - Analyzing decision trees
    - Feature importance in trees
    - Strengths, weaknesses, and parameters
  - Ensembles of Decision Trees
    - Random forests
      - Building random forests
      - Analyzing random forests
      - Strengths, weaknesses, and parameters
    - Gradient boosted regression trees (gradient boosting machines)
      - Strengths, weaknesses, and parameters
  - Kernelized Support Vector Machines
    - Linear models and nonlinear features
    - The kernel trick
    - Understanding SVMs
    - Tuning SVM parameters
    - Preprocessing data for SVMs
    - Strengths, weaknesses, and parameters
  - Neural Networks (Deep Learning)
    - The neural network model
    - Tuning neural networks
    - Strengths, weaknesses, and parameters
      - Estimating complexity in neural networks
- Uncertainty Estimates from Classifiers
  - The Decision Function
  - Predicting Probabilities
  - Uncertainty in Multiclass Classification
- Summary and Outlook
3. Unsupervised Learning and Preprocessing
- Types of Unsupervised Learning
- Challenges in Unsupervised Learning
- Preprocessing and Scaling
  - Different Kinds of Preprocessing
  - Applying Data Transformations
  - Scaling Training and Test Data the Same Way
  - The Effect of Preprocessing on Supervised Learning
- Dimensionality Reduction, Feature Extraction, and Manifold Learning
  - Principal Component Analysis (PCA)
    - Applying PCA to the cancer dataset for visualization
    - Eigenfaces for feature extraction
  - Non-Negative Matrix Factorization (NMF)
    - Applying NMF to synthetic data
    - Applying NMF to face images
  - Manifold Learning with t-SNE
- Clustering
  - k-Means Clustering
    - Failure cases of k-means
    - Vector quantization, or seeing k-means as decomposition
  - Agglomerative Clustering
    - Hierarchical clustering and dendrograms
  - DBSCAN
  - Comparing and Evaluating Clustering Algorithms
    - Evaluating clustering with ground truth
    - Evaluating clustering without ground truth
    - Comparing algorithms on the faces dataset
      - Analyzing the faces dataset with DBSCAN
      - Analyzing the faces dataset with k-means
      - Analyzing the faces dataset with agglomerative clustering
  - Summary of Clustering Methods
- Summary and Outlook
4. Representing Data and Engineering Features
- Categorical Variables
  - One-Hot-Encoding (Dummy Variables)
    - Checking string-encoded categorical data
  - Numbers Can Encode Categoricals
- Binning, Discretization, Linear Models, and Trees
- Interactions and Polynomials
- Univariate Nonlinear Transformations
- Automatic Feature Selection
  - Univariate Statistics
  - Model-Based Feature Selection
  - Iterative Feature Selection
- Utilizing Expert Knowledge
- Summary and Outlook
5. Model Evaluation and Improvement
- Cross-Validation
  - Cross-Validation in scikit-learn
  - Benefits of Cross-Validation
  - Stratified k-Fold Cross-Validation and Other Strategies
    - More control over cross-validation
    - Leave-one-out cross-validation
    - Shuffle-split cross-validation
    - Cross-validation with groups
- Grid Search
  - Simple Grid Search
  - The Danger of Overfitting the Parameters and the Validation Set
  - Grid Search with Cross-Validation
    - Analyzing the result of cross-validation
    - Search over spaces that are not grids
    - Using different cross-validation strategies with grid search
    - Nested cross-validation
    - Parallelizing cross-validation and grid search
- Evaluation Metrics and Scoring
  - Keep the End Goal in Mind
  - Metrics for Binary Classification
    - Kinds of errors
    - Imbalanced datasets
    - Confusion matrices
      - Relation to accuracy
      - Precision, recall, and f-score
    - Taking uncertainty into account
    - Precision-recall curves and ROC curves
    - Receiver operating characteristics (ROC) and AUC
  - Metrics for Multiclass Classification
  - Regression Metrics
  - Using Evaluation Metrics in Model Selection
- Summary and Outlook
6. Algorithm Chains and Pipelines
- Parameter Selection with Preprocessing
- Building Pipelines
- Using Pipelines in Grid Searches
- The General Pipeline Interface
  - Convenient Pipeline Creation with make_pipeline
  - Accessing Step Attributes
  - Accessing Attributes in a Grid-Searched Pipeline
- Grid-Searching Preprocessing Steps and Model Parameters
- Grid-Searching Which Model To Use
- Summary and Outlook
7. Working with Text Data
- Types of Data Represented as Strings
- Example Application: Sentiment Analysis of Movie Reviews
- Representing Text Data as a Bag of Words
  - Applying Bag-of-Words to a Toy Dataset
  - Bag-of-Words for Movie Reviews
- Stopwords
- Rescaling the Data with tfidf
- Investigating Model Coefficients
- Bag-of-Words with More Than One Word (n-Grams)
- Advanced Tokenization, Stemming, and Lemmatization
- Topic Modeling and Document Clustering
  - Latent Dirichlet Allocation
- Summary and Outlook
8. Wrapping Up
- Approaching a Machine Learning Problem
  - Humans in the Loop
- From Prototype to Production
- Testing Production Systems
- Building Your Own Estimator
- Where to Go from Here
  - Theory
  - Other Machine Learning Frameworks and Packages
  - Ranking, Recommender Systems, and Other Kinds of Learning
  - Probabilistic Modeling, Inference, and Probabilistic Programming
  - Neural Networks
  - Scaling to Larger Datasets
  - Honing Your Skills
- Conclusion
Index