Python Data Science Handbook. 2nd Edition - Helion
ISBN: 9781098121181
stron: 590, Format: ebook
Data wydania: 2022-12-06
Księgarnia: Helion
Cena książki: 254,15 zł (poprzednio: 299,00 zł)
Oszczędzasz: 15% (-44,85 zł)
Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all--IPython, NumPy, pandas, Matplotlib, scikit-learn, and other related tools.
Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.
With this handbook, you'll learn how:
- IPython and Jupyter provide computational environments for scientists using Python
- NumPy includes the ndarray for efficient storage and manipulation of dense data arrays
- Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data
- Matplotlib includes capabilities for a flexible range of data visualizations
- Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms
Osoby które kupowały "Python Data Science Handbook. 2nd Edition", wybierały także:
- Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
- Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
- Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)
- Scrum. O zwinnym zarządzaniu projektami. Wydanie II rozszerzone 58,64 zł, (12,90 zł -78%)
- Od hierarchii do turkusu, czyli jak zarządzać w XXI wieku 58,64 zł, (12,90 zł -78%)
Spis treści
Python Data Science Handbook. 2nd Edition eBook -- spis treści
- Preface
- What Is Data Science?
- Who Is This Book For?
- Why Python?
- Outline of the Book
- Installation Considerations
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- I. Jupyter: Beyond Normal Python
- 1. Getting Started in IPython and Jupyter
- Launching the IPython Shell
- Launching the Jupyter Notebook
- Help and Documentation in IPython
- Accessing Documentation with ?
- Accessing Source Code with ??
- Exploring Modules with Tab Completion
- Tab completion of object contents
- Tab completion when importing
- Beyond tab completion: Wildcard matching
- Keyboard Shortcuts in the IPython Shell
- Navigation Shortcuts
- Text Entry Shortcuts
- Command History Shortcuts
- Miscellaneous Shortcuts
- 2. Enhanced Interactive Features
- IPython Magic Commands
- Running External Code: %run
- Timing Code Execution: %timeit
- Help on Magic Functions: ?, %magic, and %lsmagic
- Input and Output History
- IPythons In and Out Objects
- Underscore Shortcuts and Previous Outputs
- Suppressing Output
- Related Magic Commands
- IPython and Shell Commands
- Quick Introduction to the Shell
- Shell Commands in IPython
- Passing Values to and from the Shell
- Shell-Related Magic Commands
- IPython Magic Commands
- 3. Debugging and Profiling
- Errors and Debugging
- Controlling Exceptions: %xmode
- Debugging: When Reading Tracebacks Is Not Enough
- Profiling and Timing Code
- Timing Code Snippets: %timeit and %time
- Profiling Full Scripts: %prun
- Line-by-Line Profiling with %lprun
- Profiling Memory Use: %memit and %mprun
- More IPython Resources
- Web Resources
- Books
- Errors and Debugging
- II. Introduction to NumPy
- 4. Understanding Data Types in Python
- A Python Integer Is More Than Just an Integer
- A Python List Is More Than Just a List
- Fixed-Type Arrays in Python
- Creating Arrays from Python Lists
- Creating Arrays from Scratch
- NumPy Standard Data Types
- 5. The Basics of NumPy Arrays
- NumPy Array Attributes
- Array Indexing: Accessing Single Elements
- Array Slicing: Accessing Subarrays
- One-Dimensional Subarrays
- Multidimensional Subarrays
- Subarrays as No-Copy Views
- Creating Copies of Arrays
- Reshaping of Arrays
- Array Concatenation and Splitting
- Concatenation of Arrays
- Splitting of Arrays
- 6. Computation on NumPy Arrays: Universal Functions
- The Slowness of Loops
- Introducing Ufuncs
- Exploring NumPys Ufuncs
- Array Arithmetic
- Absolute Value
- Trigonometric Functions
- Exponents and Logarithms
- Specialized Ufuncs
- Advanced Ufunc Features
- Specifying Output
- Aggregations
- Outer Products
- Ufuncs: Learning More
- 7. Aggregations: min, max, and Everything in Between
- Summing the Values in an Array
- Minimum and Maximum
- Multidimensional Aggregates
- Other Aggregation Functions
- Example: What Is the Average Height of US Presidents?
- 8. Computation on Arrays: Broadcasting
- Introducing Broadcasting
- Rules of Broadcasting
- Broadcasting Example 1
- Broadcasting Example 2
- Broadcasting Example 3
- Broadcasting in Practice
- Centering an Array
- Plotting a Two-Dimensional Function
- 9. Comparisons, Masks, and Boolean Logic
- Example: Counting Rainy Days
- Comparison Operators as Ufuncs
- Working with Boolean Arrays
- Counting Entries
- Boolean Operators
- Boolean Arrays as Masks
- Using the Keywords and/or Versus the Operators &/|
- 10. Fancy Indexing
- Exploring Fancy Indexing
- Combined Indexing
- Example: Selecting Random Points
- Modifying Values with Fancy Indexing
- Example: Binning Data
- 11. Sorting Arrays
- Fast Sorting in NumPy: np.sort and np.argsort
- Sorting Along Rows or Columns
- Partial Sorts: Partitioning
- Example: k-Nearest Neighbors
- 12. Structured Data: NumPys Structured Arrays
- Exploring Structured Array Creation
- More Advanced Compound Types
- Record Arrays: Structured Arrays with a Twist
- On to Pandas
- III. Data Manipulation with Pandas
- 13. Introducing Pandas Objects
- The Pandas Series Object
- Series as Generalized NumPy Array
- Series as Specialized Dictionary
- Constructing Series Objects
- The Pandas DataFrame Object
- DataFrame as Generalized NumPy Array
- DataFrame as Specialized Dictionary
- Constructing DataFrame Objects
- From a single Series object
- From a list of dicts
- From a dictionary of Series objects
- From a two-dimensional NumPy array
- From a NumPy structured array
- The Pandas Index Object
- Index as Immutable Array
- Index as Ordered Set
- The Pandas Series Object
- 14. Data Indexing and Selection
- Data Selection in Series
- Series as Dictionary
- Series as One-Dimensional Array
- Indexers: loc and iloc
- Data Selection in DataFrames
- DataFrame as Dictionary
- DataFrame as Two-Dimensional Array
- Additional Indexing Conventions
- Data Selection in Series
- 15. Operating on Data in Pandas
- Ufuncs: Index Preservation
- Ufuncs: Index Alignment
- Index Alignment in Series
- Index Alignment in DataFrames
- Ufuncs: Operations Between DataFrames and Series
- 16. Handling Missing Data
- Trade-offs in Missing Data Conventions
- Missing Data in Pandas
- None as a Sentinel Value
- NaN: Missing Numerical Data
- NaN and None in Pandas
- Pandas Nullable Dtypes
- Operating on Null Values
- Detecting Null Values
- Dropping Null Values
- Filling Null Values
- 17. Hierarchical Indexing
- A Multiply Indexed Series
- The Bad Way
- The Better Way: The Pandas MultiIndex
- MultiIndex as Extra Dimension
- Methods of MultiIndex Creation
- Explicit MultiIndex Constructors
- MultiIndex Level Names
- MultiIndex for Columns
- Indexing and Slicing a MultiIndex
- Multiply Indexed Series
- Multiply Indexed DataFrames
- Rearranging Multi-Indexes
- Sorted and Unsorted Indices
- Stacking and Unstacking Indices
- Index Setting and Resetting
- A Multiply Indexed Series
- 18. Combining Datasets: concat and append
- Recall: Concatenation of NumPy Arrays
- Simple Concatenation with pd.concat
- Duplicate Indices
- Treating repeated indices as an error
- Ignoring the index
- Adding MultiIndex keys
- Concatenation with Joins
- The append Method
- Duplicate Indices
- 19. Combining Datasets: merge and join
- Relational Algebra
- Categories of Joins
- One-to-One Joins
- Many-to-One Joins
- Many-to-Many Joins
- Specification of the Merge Key
- The on Keyword
- The left_on and right_on Keywords
- The left_index and right_index Keywords
- Specifying Set Arithmetic for Joins
- Overlapping Column Names: The suffixes Keyword
- Example: US States Data
- 20. Aggregation and Grouping
- Planets Data
- Simple Aggregation in Pandas
- groupby: Split, Apply, Combine
- Split, Apply, Combine
- The GroupBy Object
- Column indexing
- Iteration over groups
- Dispatch methods
- Aggregate, Filter, Transform, Apply
- Aggregation
- Filtering
- Transformation
- The apply method
- Specifying the Split Key
- A list, array, series, or index providing the grouping keys
- A dictionary or series mapping index to group
- Any Python function
- A list of valid keys
- Grouping Example
- 21. Pivot Tables
- Motivating Pivot Tables
- Pivot Tables by Hand
- Pivot Table Syntax
- Multilevel Pivot Tables
- Additional Pivot Table Options
- Example: Birthrate Data
- 22. Vectorized String Operations
- Introducing Pandas String Operations
- Tables of Pandas String Methods
- Methods Similar to Python String Methods
- Methods Using Regular Expressions
- Miscellaneous Methods
- Vectorized item access and slicing
- Indicator variables
- Example: Recipe Database
- A Simple Recipe Recommender
- Going Further with Recipes
- 23. Working with Time Series
- Dates and Times in Python
- Native Python Dates and Times: datetime and dateutil
- Typed Arrays of Times: NumPys datetime64
- Dates and Times in Pandas: The Best of Both Worlds
- Pandas Time Series: Indexing by Time
- Pandas Time Series Data Structures
- Regular Sequences: pd.date_range
- Frequencies and Offsets
- Resampling, Shifting, and Windowing
- Resampling and Converting Frequencies
- Time Shifts
- Rolling Windows
- Example: Visualizing Seattle Bicycle Counts
- Visualizing the Data
- Digging into the Data
- Dates and Times in Python
- 24. High-Performance Pandas: eval and query
- Motivating query and eval: Compound Expressions
- pandas.eval for Efficient Operations
- DataFrame.eval for Column-Wise Operations
- Assignment in DataFrame.eval
- Local Variables in DataFrame.eval
- The DataFrame.query Method
- Performance: When to Use These Functions
- Further Resources
- IV. Visualization with Matplotlib
- 25. General Matplotlib Tips
- Importing Matplotlib
- Setting Styles
- show or No show? How to Display Your Plots
- Plotting from a Script
- Plotting from an IPython Shell
- Plotting from a Jupyter Notebook
- Saving Figures to File
- Two Interfaces for the Price of One
- MATLAB-style Interface
- Object-oriented interface
- 26. Simple Line Plots
- Adjusting the Plot: Line Colors and Styles
- Adjusting the Plot: Axes Limits
- Labeling Plots
- Matplotlib Gotchas
- 27. Simple Scatter Plots
- Scatter Plots with plt.plot
- Scatter Plots with plt.scatter
- plot Versus scatter: A Note on Efficiency
- Visualizing Uncertainties
- Basic Errorbars
- Continuous Errors
- 28. Density and Contour Plots
- Visualizing a Three-Dimensional Function
- Histograms, Binnings, and Density
- Two-Dimensional Histograms and Binnings
- plt.hist2d: Two-Dimensional Histogram
- plt.hexbin: Hexagonal Binnings
- Kernel Density Estimation
- 29. Customizing Plot Legends
- Choosing Elements for the Legend
- Legend for Size of Points
- Multiple Legends
- 30. Customizing Colorbars
- Customizing Colorbars
- Choosing the Colormap
- Color Limits and Extensions
- Discrete Colorbars
- Example: Handwritten Digits
- Customizing Colorbars
- 31. Multiple Subplots
- plt.axes: Subplots by Hand
- plt.subplot: Simple Grids of Subplots
- plt.subplots: The Whole Grid in One Go
- plt.GridSpec: More Complicated Arrangements
- 32. Text and Annotation
- Example: Effect of Holidays on US Births
- Transforms and Text Position
- Arrows and Annotation
- 33. Customizing Ticks
- Major and Minor Ticks
- Hiding Ticks or Labels
- Reducing or Increasing the Number of Ticks
- Fancy Tick Formats
- Summary of Formatters and Locators
- 34. Customizing Matplotlib: Configurations and Stylesheets
- Plot Customization by Hand
- Changing the Defaults: rcParams
- Stylesheets
- Default Style
- FiveThiryEight Style
- ggplot Style
- Bayesian Methods for Hackers Style
- Dark Background Style
- Grayscale Style
- Seaborn Style
- 35. Three-Dimensional Plotting in Matplotlib
- Three-Dimensional Points and Lines
- Three-Dimensional Contour Plots
- Wireframes and Surface Plots
- Surface Triangulations
- Example: Visualizing a Möbius Strip
- 36. Visualization with Seaborn
- Exploring Seaborn Plots
- Histograms, KDE, and Densities
- Pair Plots
- Faceted Histograms
- Categorical Plots
- Joint Distributions
- Bar Plots
- Example: Exploring Marathon Finishing Times
- Further Resources
- Other Python Visualization Libraries
- Exploring Seaborn Plots
- V. Machine Learning
- 37. What Is Machine Learning?
- Categories of Machine Learning
- Qualitative Examples of Machine Learning Applications
- Classification: Predicting Discrete Labels
- Regression: Predicting Continuous Labels
- Clustering: Inferring Labels on Unlabeled Data
- Dimensionality Reduction: Inferring Structure of Unlabeled Data
- Summary
- 38. Introducing Scikit-Learn
- Data Representation in Scikit-Learn
- The Features Matrix
- The Target Array
- The Estimator API
- Basics of the API
- Supervised Learning Example: Simple Linear Regression
- 1. Choose a class of model
- 2. Choose model hyperparameters
- 3. Arrange data into a features matrix and target vector
- 4. Fit the model to the data
- 5. Predict labels for unknown data
- Supervised Learning Example: Iris Classification
- Unsupervised Learning Example: Iris Dimensionality
- Unsupervised Learning Example: Iris Clustering
- Application: Exploring Handwritten Digits
- Loading and Visualizing the Digits Data
- Unsupervised Learning Example: Dimensionality Reduction
- Classification on Digits
- Summary
- Data Representation in Scikit-Learn
- 39. Hyperparameters and Model Validation
- Thinking About Model Validation
- Model Validation the Wrong Way
- Model Validation the Right Way: Holdout Sets
- Model Validation via Cross-Validation
- Selecting the Best Model
- The Bias-Variance Trade-off
- Validation Curves in Scikit-Learn
- Learning Curves
- Validation in Practice: Grid Search
- Summary
- Thinking About Model Validation
- 40. Feature Engineering
- Categorical Features
- Text Features
- Image Features
- Derived Features
- Imputation of Missing Data
- Feature Pipelines
- 41. In Depth: Naive Bayes Classification
- Bayesian Classification
- Gaussian Naive Bayes
- Multinomial Naive Bayes
- Example: Classifying Text
- When to Use Naive Bayes
- 42. In Depth: Linear Regression
- Simple Linear Regression
- Basis Function Regression
- Polynomial Basis Functions
- Gaussian Basis Functions
- Regularization
- Ridge Regression (L2 Regularization)
- Lasso Regression (L1 Regularization)
- Example: Predicting Bicycle Traffic
- 43. In Depth: Support Vector Machines
- Motivating Support Vector Machines
- Support Vector Machines: Maximizing the Margin
- Fitting a Support Vector Machine
- Beyond Linear Boundaries: Kernel SVM
- Tuning the SVM: Softening Margins
- Example: Face Recognition
- Summary
- 44. In Depth: Decision Trees and Random Forests
- Motivating Random Forests: Decision Trees
- Creating a Decision Tree
- Decision Trees and Overfitting
- Ensembles of Estimators: Random Forests
- Random Forest Regression
- Example: Random Forest for Classifying Digits
- Summary
- Motivating Random Forests: Decision Trees
- 45. In Depth: Principal Component Analysis
- Introducing Principal Component Analysis
- PCA as Dimensionality Reduction
- PCA for Visualization: Handwritten Digits
- What Do the Components Mean?
- Choosing the Number of Components
- PCA as Noise Filtering
- Example: Eigenfaces
- Summary
- Introducing Principal Component Analysis
- 46. In Depth: Manifold Learning
- Manifold Learning: HELLO
- Multidimensional Scaling
- MDS as Manifold Learning
- Nonlinear Embeddings: Where MDS Fails
- Nonlinear Manifolds: Locally Linear Embedding
- Some Thoughts on Manifold Methods
- Example: Isomap on Faces
- Example: Visualizing Structure in Digits
- 47. In Depth: k-Means Clustering
- Introducing k-Means
- ExpectationMaximization
- Examples
- Example 1: k-Means on Digits
- Example 2: k-Means for Color Compression
- 48. In Depth: Gaussian Mixture Models
- Motivating Gaussian Mixtures: Weaknesses of k-Means
- Generalizing EM: Gaussian Mixture Models
- Choosing the Covariance Type
- Gaussian Mixture Models as Density Estimation
- Example: GMMs for Generating New Data
- 49. In Depth: Kernel Density Estimation
- Motivating Kernel Density Estimation: Histograms
- Kernel Density Estimation in Practice
- Selecting the Bandwidth via Cross-Validation
- Example: Not-so-Naive Bayes
- Anatomy of a Custom Estimator
- Using Our Custom Estimator
- 50. Application: A Face Detection Pipeline
- HOG Features
- HOG in Action: A Simple Face Detector
- 1. Obtain a Set of Positive Training Samples
- 2. Obtain a Set of Negative Training Samples
- 3. Combine Sets and Extract HOG Features
- 4. Train a Support Vector Machine
- 5. Find Faces in a New Image
- Caveats and Improvements
- Further Machine Learning Resources
- Index