Python Data Science Handbook. Essential Tools for Working with Data - Helion

ebook

Autor: Jake VanderPlas
ISBN: 978-14-919-1213-3
stron: 548, Format: ebook
Data wydania: 2016-11-21
Księgarnia: Helion

Cena książki: 239,00 zł

Osoby, które kupiły tę książkę, wybierały także »

Tagi: Python - Programowanie

For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.

Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.

With this handbook, you’ll learn how to use:

IPython and Jupyter: provide computational environments for data scientists using Python
NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python
Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python
Matplotlib: includes capabilities for a flexible range of data visualizations in Python
Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms

Osoby które kupowały "Python Data Science Handbook. Essential Tools for Working with Data", wybierały także:

Django 4. Praktyczne tworzenie aplikacji sieciowych. Wydanie IV 125,48 zł, (38,90 zł -69%)
Django. Kurs video. Aplikacje webowe w Pythonie 117,35 zł, (39,90 zł -66%)
Sztuczna inteligencja w Azure. Kurs video. Uczenie maszynowe i Azure Machine Learning Service 199,00 zł, (69,65 zł -65%)
Web scraping w Data Science. Kurs video. Uczenie maszynowe i architektura splotowych sieci neuronowych 178,97 zł, (62,64 zł -65%)
Data Science w Pythonie. Kurs video. Algorytmy uczenia maszynowego 199,00 zł, (69,65 zł -65%)

Spis treści

Python Data Science Handbook. Essential Tools for Working with Data eBook -- spis treści

Preface
- What Is Data Science?
- Who Is This Book For?
- Why Python?
  - Python 2 Versus Python 3
- Outline of This Book
- Using Code Examples
- Installation Considerations
- Conventions Used in This Book
- OReilly Safari
- How to Contact Us
1. IPython: Beyond Normal Python
- Shell or Notebook?
  - Launching the IPython Shell
  - Launching the Jupyter Notebook
- Help and Documentation in IPython
  - Accessing Documentation with ?
  - Accessing Source Code with ??
  - Exploring Modules with Tab Completion
    - Tab completion of object contents
    - Tab completion when importing
    - Beyond tab completion: Wildcard matching
- Keyboard Shortcuts in the IPython Shell
  - Navigation Shortcuts
  - Text Entry Shortcuts
  - Command History Shortcuts
  - Miscellaneous Shortcuts
- IPython Magic Commands
  - Pasting Code Blocks: %paste and %cpaste
  - Running External Code: %run
  - Timing Code Execution: %timeit
  - Help on Magic Functions: ?, %magic, and %lsmagic
- Input and Output History
  - IPythons In and Out Objects
  - Underscore Shortcuts and Previous Outputs
  - Suppressing Output
  - Related Magic Commands
- IPython and Shell Commands
  - Quick Introduction to the Shell
  - Shell Commands in IPython
  - Passing Values to and from the Shell
- Shell-Related Magic Commands
- Errors and Debugging
  - Controlling Exceptions: %xmode
  - Debugging: When Reading Tracebacks Is Not Enough
    - Partial list of debugging commands
- Profiling and Timing Code
  - Timing Code Snippets: %timeit and %time
  - Profiling Full Scripts: %prun
  - Line-by-Line Profiling with %lprun
  - Profiling Memory Use: %memit and %mprun
- More IPython Resources
  - Web Resources
  - Books
2. Introduction to NumPy
- Understanding Data Types in Python
  - A Python Integer Is More Than Just an Integer
  - A Python List Is More Than Just a List
  - Fixed-Type Arrays in Python
  - Creating Arrays from Python Lists
  - Creating Arrays from Scratch
  - NumPy Standard Data Types
- The Basics of NumPy Arrays
  - NumPy Array Attributes
  - Array Indexing: Accessing Single Elements
  - Array Slicing: Accessing Subarrays
    - One-dimensional subarrays
    - Multidimensional subarrays
      - Accessing array rows and columns
    - Subarrays as no-copy views
    - Creating copies of arrays
  - Reshaping of Arrays
  - Array Concatenation and Splitting
    - Concatenation of arrays
    - Splitting of arrays
- Computation on NumPy Arrays: Universal Functions
  - The Slowness of Loops
  - Introducing UFuncs
  - Exploring NumPys UFuncs
    - Array arithmetic
    - Absolute value
    - Trigonometric functions
    - Exponents and logarithms
    - Specialized ufuncs
  - Advanced Ufunc Features
    - Specifying output
    - Aggregates
    - Outer products
  - Ufuncs: Learning More
- Aggregations: Min, Max, and Everything in Between
  - Summing the Values in an Array
  - Minimum and Maximum
    - Multidimensional aggregates
    - Other aggregation functions
  - Example: What Is the Average Height of US Presidents?
- Computation on Arrays: Broadcasting
  - Introducing Broadcasting
  - Rules of Broadcasting
    - Broadcasting example 1
    - Broadcasting example 2
    - Broadcasting example 3
  - Broadcasting in Practice
    - Centering an array
    - Plotting a two-dimensional function
- Comparisons, Masks, and Boolean Logic
  - Example: Counting Rainy Days
    - Digging into the data
  - Comparison Operators as ufuncs
  - Working with Boolean Arrays
    - Counting entries
    - Boolean operators
  - Boolean Arrays as Masks
- Fancy Indexing
  - Exploring Fancy Indexing
  - Combined Indexing
  - Example: Selecting Random Points
  - Modifying Values with Fancy Indexing
  - Example: Binning Data
- Sorting Arrays
  - Fast Sorting in NumPy: np.sort and np.argsort
    - Sorting along rows or columns
  - Partial Sorts: Partitioning
  - Example: k-Nearest Neighbors
- Structured Data: NumPys Structured Arrays
  - Creating Structured Arrays
  - More Advanced Compound Types
  - RecordArrays: Structured Arrays with a Twist
  - On to Pandas
3. Data Manipulation with Pandas
- Installing and Using Pandas
- Introducing Pandas Objects
  - The Pandas Series Object
    - Series as generalized NumPy array
    - Series as specialized dictionary
    - Constructing Series objects
  - The Pandas DataFrame Object
    - DataFrame as a generalized NumPy array
    - DataFrame as specialized dictionary
    - Constructing DataFrame objects
      - From a single Series object
      - From a list of dicts
      - From a dictionary of Series objects
      - From a two-dimensional NumPy array
      - From a NumPy structured array
  - The Pandas Index Object
    - Index as immutable array
    - Index as ordered set
- Data Indexing and Selection
  - Data Selection in Series
    - Series as dictionary
    - Series as one-dimensional array
    - Indexers: loc, iloc, and ix
  - Data Selection in DataFrame
    - DataFrame as a dictionary
    - DataFrame as two-dimensional array
    - Additional indexing conventions
- Operating on Data in Pandas
  - Ufuncs: Index Preservation
  - UFuncs: Index Alignment
    - Index alignment in Series
    - Index alignment in DataFrame
  - Ufuncs: Operations Between DataFrame and Series
- Handling Missing Data
  - Trade-Offs in Missing Data Conventions
  - Missing Data in Pandas
    - None: Pythonic missing data
    - NaN: Missing numerical data
    - NaN and None in Pandas
  - Operating on Null Values
    - Detecting null values
    - Dropping null values
    - Filling null values
- Hierarchical Indexing
  - A Multiply Indexed Series
    - The bad way
    - The better way: Pandas MultiIndex
    - MultiIndex as extra dimension
  - Methods of MultiIndex Creation
    - Explicit MultiIndex constructors
    - MultiIndex level names
    - MultiIndex for columns
  - Indexing and Slicing a MultiIndex
    - Multiply indexed Series
    - Multiply indexed DataFrames
  - Rearranging Multi-Indices
    - Sorted and unsorted indices
    - Stacking and unstacking indices
    - Index setting and resetting
  - Data Aggregations on Multi-Indices
- Combining Datasets: Concat and Append
  - Recall: Concatenation of NumPy Arrays
  - Simple Concatenation with pd.concat
    - Duplicate indices
      - Catching the repeats as an error
      - Ignoring the index
      - Adding MultiIndex keys
    - Concatenation with joins
    - The append() method
- Combining Datasets: Merge and Join
  - Relational Algebra
  - Categories of Joins
    - One-to-one joins
    - Many-to-one joins
    - Many-to-many joins
  - Specification of the Merge Key
    - The on keyword
    - The left_on and right_on keywords
    - The left_index and right_index keywords
  - Specifying Set Arithmetic for Joins
  - Overlapping Column Names: The suffixes Keyword
  - Example: US States Data
- Aggregation and Grouping
  - Planets Data
  - Simple Aggregation in Pandas
  - GroupBy: Split, Apply, Combine
    - Split, apply, combine
    - The GroupBy object
      - Column indexing
      - Iteration over groups
      - Dispatch methods
    - Aggregate, filter, transform, apply
      - Aggregation
      - Filtering
      - Transformation
      - The apply() method
    - Specifying the split key
      - A list, array, series, or index providing the grouping keys
      - A dictionary or series mapping index to group
      - Any Python function
      - A list of valid keys
    - Grouping example
- Pivot Tables
  - Motivating Pivot Tables
  - Pivot Tables by Hand
  - Pivot Table Syntax
    - Multilevel pivot tables
    - Additional pivot table options
  - Example: Birthrate Data
    - Further data exploration
- Vectorized String Operations
  - Introducing Pandas String Operations
  - Tables of Pandas String Methods
    - Methods similar to Python string methods
    - Methods using regular expressions
    - Miscellaneous methods
      - Vectorized item access and slicing
      - Indicator variables
  - Example: Recipe Database
    - A simple recipe recommender
    - Going further with recipes
- Working with Time Series
  - Dates and Times in Python
    - Native Python dates and times: datetime and dateutil
    - Typed arrays of times: NumPys datetime64
    - Dates and times in Pandas: Best of both worlds
  - Pandas Time Series: Indexing by Time
  - Pandas Time Series Data Structures
    - Regular sequences: pd.date_range()
  - Frequencies and Offsets
  - Resampling, Shifting, and Windowing
    - Resampling and converting frequencies
    - Time-shifts
    - Rolling windows
  - Where to Learn More
  - Example: Visualizing Seattle Bicycle Counts
    - Visualizing the data
    - Digging into the data
- High-Performance Pandas: eval() and query()
  - Motivating query() and eval(): Compound Expressions
  - pandas.eval() for Efficient Operations
    - Operations supported by pd.eval()
      - Arithmetic operators
      - Comparison operators
      - Bitwise operators
      - Object attributes and indices
      - Other operations
  - DataFrame.eval() for Column-Wise Operations
    - Assignment in DataFrame.eval()
    - Local variables in DataFrame.eval()
  - DataFrame.query() Method
  - Performance: When to Use These Functions
- Further Resources
4. Visualization with Matplotlib
- General Matplotlib Tips
  - Importing matplotlib
  - Setting Styles
  - show() or No show()? How to Display Your Plots
    - Plotting from a script
    - Plotting from an IPython shell
    - Plotting from an IPython notebook
  - Saving Figures to File
- Two Interfaces for the Price of One
  - MATLAB-style interface
  - Object-oriented interface
- Simple Line Plots
  - Adjusting the Plot: Line Colors and Styles
  - Adjusting the Plot: Axes Limits
  - Labeling Plots
- Simple Scatter Plots
  - Scatter Plots with plt.plot
  - Scatter Plots with plt.scatter
  - plot Versus scatter: A Note on Efficiency
- Visualizing Errors
  - Basic Errorbars
  - Continuous Errors
- Density and Contour Plots
  - Visualizing a Three-Dimensional Function
- Histograms, Binnings, and Density
  - Two-Dimensional Histograms and Binnings
    - plt.hist2d: Two-dimensional histogram
    - plt.hexbin: Hexagonal binnings
    - Kernel density estimation
- Customizing Plot Legends
  - Choosing Elements for the Legend
  - Legend for Size of Points
  - Multiple Legends
- Customizing Colorbars
  - Customizing Colorbars
    - Choosing the colormap
    - Color limits and extensions
    - Discrete colorbars
  - Example: Handwritten Digits
- Multiple Subplots
  - plt.axes: Subplots by Hand
  - plt.subplot: Simple Grids of Subplots
  - plt.subplots: The Whole Grid in One Go
  - plt.GridSpec: More Complicated Arrangements
- Text and Annotation
  - Example: Effect of Holidays on US Births
  - Transforms and Text Position
  - Arrows and Annotation
- Customizing Ticks
  - Major and Minor Ticks
  - Hiding Ticks or Labels
  - Reducing or Increasing the Number of Ticks
  - Fancy Tick Formats
  - Summary of Formatters and Locators
- Customizing Matplotlib: Configurations and Stylesheets
  - Plot Customization by Hand
  - Changing the Defaults: rcParams
  - Stylesheets
    - Default style
    - FiveThirtyEight style
    - ggplot
    - Bayesian Methods for Hackers style
    - Dark background
    - Grayscale
    - Seaborn style
- Three-Dimensional Plotting in Matplotlib
  - Three-Dimensional Points and Lines
  - Three-Dimensional Contour Plots
  - Wireframes and Surface Plots
  - Surface Triangulations
    - Example: Visualizing a Möbius strip
- Geographic Data with Basemap
  - Map Projections
    - Cylindrical projections
    - Pseudo-cylindrical projections
    - Perspective projections
    - Conic projections
    - Other projections
  - Drawing a Map Background
  - Plotting Data on Maps
  - Example: California Cities
  - Example: Surface Temperature Data
- Visualization with Seaborn
  - Seaborn Versus Matplotlib
  - Exploring Seaborn Plots
    - Histograms, KDE, and densities
    - Pair plots
    - Faceted histograms
    - Factor plots
    - Joint distributions
    - Bar plots
  - Example: Exploring Marathon Finishing Times
- Further Resources
  - Matplotlib Resources
  - Other Python Graphics Libraries
5. Machine Learning
- What Is Machine Learning?
  - Categories of Machine Learning
  - Qualitative Examples of Machine Learning Applications
    - Classification: Predicting discrete labels
    - Regression: Predicting continuous labels
    - Clustering: Inferring labels on unlabeled data
    - Dimensionality reduction: Inferring structure of unlabeled data
  - Summary
- Introducing Scikit-Learn
  - Data Representation in Scikit-Learn
    - Data as table
    - Features matrix
    - Target array
  - Scikit-Learns Estimator API
    - Basics of the API
    - Supervised learning example: Simple linear regression
    - Supervised learning example: Iris classification
    - Unsupervised learning example: Iris dimensionality
    - Unsupervised learning: Iris clustering
  - Application: Exploring Handwritten Digits
    - Loading and visualizing the digits data
    - Unsupervised learning: Dimensionality reduction
    - Classification on digits
  - Summary
- Hyperparameters and Model Validation
  - Thinking About Model Validation
    - Model validation the wrong way
    - Model validation the right way: Holdout sets
    - Model validation via cross-validation
  - Selecting the Best Model
    - The biasvariance trade-off
    - Validation curves in Scikit-Learn
  - Learning Curves
    - Learning curves in Scikit-Learn
  - Validation in Practice: Grid Search
  - Summary
- Feature Engineering
  - Categorical Features
  - Text Features
  - Image Features
  - Derived Features
  - Imputation of Missing Data
  - Feature Pipelines
- In Depth: Naive Bayes Classification
  - Bayesian Classification
  - Gaussian Naive Bayes
  - Multinomial Naive Bayes
    - Example: Classifying text
  - When to Use Naive Bayes
- In Depth: Linear Regression
  - Simple Linear Regression
  - Basis Function Regression
    - Polynomial basis functions
    - Gaussian basis functions
  - Regularization
    - Ridge regression ( regularization)
    - Lasso regularization ()
  - Example: Predicting Bicycle Traffic
- In-Depth: Support Vector Machines
  - Motivating Support Vector Machines
  - Support Vector Machines: Maximizing the Margin
    - Fitting a support vector machine
    - Beyond linear boundaries: Kernel SVM
    - Tuning the SVM: Softening margins
  - Example: Face Recognition
  - Support Vector Machine Summary
- In-Depth: Decision Trees and Random Forests
  - Motivating Random Forests: Decision Trees
    - Creating a decision tree
    - Decision trees and overfitting
  - Ensembles of Estimators: Random Forests
  - Random Forest Regression
  - Example: Random Forest for Classifying Digits
  - Summary of Random Forests
- In Depth: Principal Component Analysis
  - Introducing Principal Component Analysis
    - PCA as dimensionality reduction
    - PCA for visualization: Handwritten digits
    - What do the components mean?
    - Choosing the number of components
  - PCA as Noise Filtering
  - Example: Eigenfaces
  - Principal Component Analysis Summary
- In-Depth: Manifold Learning
  - Manifold Learning: HELLO
  - Multidimensional Scaling (MDS)
  - MDS as Manifold Learning
  - Nonlinear Embeddings: Where MDS Fails
  - Nonlinear Manifolds: Locally Linear Embedding
  - Some Thoughts on Manifold Methods
  - Example: Isomap on Faces
  - Example: Visualizing Structure in Digits
- In Depth: k-Means Clustering
  - Introducing k-Means
  - k-Means Algorithm: ExpectationMaximization
    - Caveats of expectationmaximization
  - Examples
    - Example 1: k-Means on digits
    - Example 2: k-means for color compression
- In Depth: Gaussian Mixture Models
  - Motivating GMM: Weaknesses of k-Means
  - Generalizing EM: Gaussian Mixture Models
    - Choosing the covariance type
  - GMM as Density Estimation
    - How many components?
  - Example: GMM for Generating New Data
- In-Depth: Kernel Density Estimation
  - Motivating KDE: Histograms
  - Kernel Density Estimation in Practice
    - Selecting the bandwidth via cross-validation
  - Example: KDE on a Sphere
  - Example: Not-So-Naive Bayes
    - The anatomy of a custom estimator
    - Using our custom estimator
- Application: A Face Detection Pipeline
  - HOG Features
  - HOG in Action: A Simple Face Detector
  - Caveats and Improvements
- Further Machine Learning Resources
  - Machine Learning in Python
  - General Machine Learning
Index