Python for Data Analysis. Data Wrangling with Pandas, NumPy, and IPython. 2nd Edition - Helion

ebook

Autor: Wes McKinney
ISBN: 978-14-919-5761-5
stron: 550, Format: ebook
Data wydania: 2017-09-25
Księgarnia: Helion

Cena książki: 169,00 zł

Osoby, które kupiły tę książkę, wybierały także »

Tagi: Analiza danych | Python - Programowanie

Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process.

Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.

Use the IPython shell and Jupyter notebook for exploratory computing
Learn basic and advanced features in NumPy (Numerical Python)
Get started with data analysis tools in the pandas library
Use flexible tools to load, clean, transform, merge, and reshape data
Create informative visualizations with matplotlib
Apply the pandas groupby facility to slice, dice, and summarize datasets
Analyze and manipulate regular and irregular time series data
Learn how to solve real-world data analysis problems with thorough, detailed examples

Osoby które kupowały "Python for Data Analysis. Data Wrangling with Pandas, NumPy, and IPython. 2nd Edition", wybierały także:

Data Science w Pythonie. Kurs video. Przetwarzanie i analiza danych 149,00 zł, (67,05 zł -55%)
Excel 2013. Kurs video. Poziom drugi. Przetwarzanie i analiza danych 79,00 zł, (35,55 zł -55%)
Kompletny przewodnik po Power Query (M). Opanuj wykonywanie z 169,00 zł, (84,50 zł -50%)
OSINT w praktyce. Jak gromadzi 67,00 zł, (33,50 zł -50%)
Mistrz analizy danych. Od danych do wiedzy. Wydanie II 99,00 zł, (49,50 zł -50%)

Spis treści

Python for Data Analysis. Data Wrangling with Pandas, NumPy, and IPython. 2nd Edition eBook -- spis treści

Preface
- 1. New for the Second Edition
- 2. Conventions Used in This Book
- 3. Using Code Examples
- 4. OReilly Safari
- 5. How to Contact Us
- 6. Acknowledgments
  - In Memoriam: John D. Hunter (19682012)
  - Acknowledgments for the Second Edition (2017)
  - Acknowledgments for the First Edition (2012)
1. Preliminaries
- 1.1. What Is This Book About?
  - What Kinds of Data?
- 1.2. Why Python for Data Analysis?
  - Python as Glue
  - Solving the Two-Language Problem
  - Why Not Python?
- 1.3. Essential Python Libraries
  - NumPy
  - pandas
  - matplotlib
  - IPython and Jupyter
  - SciPy
  - scikit-learn
  - statsmodels
- 1.4. Installation and Setup
  - Windows
  - Apple (OS X, macOS)
  - GNU/Linux
  - Installing or Updating Python Packages
  - Python 2 and Python 3
  - Integrated Development Environments (IDEs) and Text Editors
- 1.5. Community and Conferences
- 1.6. Navigating This Book
  - Code Examples
  - Data for Examples
  - Import Conventions
  - Jargon
2. Python Language Basics, IPython, and Jupyter Notebooks
- 2.1. The Python Interpreter
- 2.2. IPython Basics
  - Running the IPython Shell
  - Running the Jupyter Notebook
  - Tab Completion
  - Introspection
  - The %run Command
    - Interrupting running code
  - Executing Code from the Clipboard
  - Terminal Keyboard Shortcuts
  - About Magic Commands
  - Matplotlib Integration
- 2.3. Python Language Basics
  - Language Semantics
    - Indentation, not braces
    - Everything is an object
    - Comments
    - Function and object method calls
    - Variables and argument passing
    - Dynamic references, strong types
    - Attributes and methods
    - Duck typing
    - Imports
    - Binary operators and comparisons
    - Mutable and immutable objects
  - Scalar Types
    - Numeric types
    - Strings
    - Bytes and Unicode
    - Booleans
    - Type casting
    - None
    - Dates and times
  - Control Flow
    - if, elif, and else
    - for loops
    - while loops
    - pass
    - range
    - Ternary expressions
3. Built-in Data Structures, Functions, and Files
- 3.1. Data Structures and Sequences
  - Tuple
    - Unpacking tuples
    - Tuple methods
  - List
    - Adding and removing elements
    - Concatenating and combining lists
    - Sorting
    - Binary search and maintaining a sorted list
    - Slicing
  - Built-in Sequence Functions
    - enumerate
    - sorted
    - zip
    - reversed
  - dict
    - Creating dicts from sequences
    - Default values
    - Valid dict key types
  - set
  - List, Set, and Dict Comprehensions
    - Nested list comprehensions
- 3.2. Functions
  - Namespaces, Scope, and Local Functions
  - Returning Multiple Values
  - Functions Are Objects
  - Anonymous (Lambda) Functions
  - Currying: Partial Argument Application
  - Generators
    - Generator expresssions
    - itertools module
  - Errors and Exception Handling
    - Exceptions in IPython
- 3.3. Files and the Operating System
  - Bytes and Unicode with Files
- 3.4. Conclusion
4. NumPy Basics: Arrays and Vectorized Computation
- 4.1. The NumPy ndarray: A Multidimensional Array Object
  - Creating ndarrays
  - Data Types for ndarrays
  - Arithmetic with NumPy Arrays
  - Basic Indexing and Slicing
    - Indexing with slices
  - Boolean Indexing
  - Fancy Indexing
  - Transposing Arrays and Swapping Axes
- 4.2. Universal Functions: Fast Element-Wise Array Functions
- 4.3. Array-Oriented Programming with Arrays
  - Expressing Conditional Logic as Array Operations
  - Mathematical and Statistical Methods
  - Methods for Boolean Arrays
  - Sorting
  - Unique and Other Set Logic
- 4.4. File Input and Output with Arrays
- 4.5. Linear Algebra
- 4.6. Pseudorandom Number Generation
- 4.7. Example: Random Walks
  - Simulating Many Random Walks at Once
- 4.8. Conclusion
5. Getting Started with pandas
- 5.1. Introduction to pandas Data Structures
  - Series
  - DataFrame
  - Index Objects
- 5.2. Essential Functionality
  - Reindexing
  - Dropping Entries from an Axis
  - Indexing, Selection, and Filtering
    - Selection with loc and iloc
  - Integer Indexes
  - Arithmetic and Data Alignment
    - Arithmetic methods with fill values
    - Operations between DataFrame and Series
  - Function Application and Mapping
  - Sorting and Ranking
  - Axis Indexes with Duplicate Labels
- 5.3. Summarizing and Computing Descriptive Statistics
  - Correlation and Covariance
  - Unique Values, Value Counts, and Membership
- 5.4. Conclusion
6. Data Loading, Storage, and File Formats
- 6.1. Reading and Writing Data in Text Format
  - Reading Text Files in Pieces
  - Writing Data to Text Format
  - Working with Delimited Formats
  - JSON Data
  - XML and HTML: Web Scraping
    - Parsing XML with lxml.objectify
- 6.2. Binary Data Formats
  - Using HDF5 Format
  - Reading Microsoft Excel Files
- 6.3. Interacting with Web APIs
- 6.4. Interacting with Databases
- 6.5. Conclusion
7. Data Cleaning and Preparation
- 7.1. Handling Missing Data
  - Filtering Out Missing Data
  - Filling In Missing Data
- 7.2. Data Transformation
  - Removing Duplicates
  - Transforming Data Using a Function or Mapping
  - Replacing Values
  - Renaming Axis Indexes
  - Discretization and Binning
  - Detecting and Filtering Outliers
  - Permutation and Random Sampling
  - Computing Indicator/Dummy Variables
- 7.3. String Manipulation
  - String Object Methods
  - Regular Expressions
  - Vectorized String Functions in pandas
- 7.4. Conclusion
8. Data Wrangling: Join, Combine, and Reshape
- 8.1. Hierarchical Indexing
  - Reordering and Sorting Levels
  - Summary Statistics by Level
  - Indexing with a DataFrames columns
- 8.2. Combining and Merging Datasets
  - Database-Style DataFrame Joins
  - Merging on Index
  - Concatenating Along an Axis
  - Combining Data with Overlap
- 8.3. Reshaping and Pivoting
  - Reshaping with Hierarchical Indexing
  - Pivoting Long to Wide Format
  - Pivoting Wide to Long Format
- 8.4. Conclusion
9. Plotting and Visualization
- 9.1. A Brief matplotlib API Primer
  - Figures and Subplots
    - Adjusting the spacing around subplots
  - Colors, Markers, and Line Styles
  - Ticks, Labels, and Legends
    - Setting the title, axis labels, ticks, and ticklabels
    - Adding legends
  - Annotations and Drawing on a Subplot
  - Saving Plots to File
  - matplotlib Configuration
- 9.2. Plotting with pandas and seaborn
  - Line Plots
  - Bar Plots
  - Histograms and Density Plots
  - Scatter or Point Plots
  - Facet Grids and Categorical Data
- 9.3. Other Python Visualization Tools
- 9.4. Conclusion
10. Data Aggregation and Group Operations
- 10.1. GroupBy Mechanics
  - Iterating Over Groups
  - Selecting a Column or Subset of Columns
  - Grouping with Dicts and Series
  - Grouping with Functions
  - Grouping by Index Levels
- 10.2. Data Aggregation
  - Column-Wise and Multiple Function Application
  - Returning Aggregated Data Without Row Indexes
- 10.3. Apply: General split-apply-combine
  - Suppressing the Group Keys
  - Quantile and Bucket Analysis
  - Example: Filling Missing Values with Group-Specific Values
  - Example: Random Sampling and Permutation
  - Example: Group Weighted Average and Correlation
  - Example: Group-Wise Linear Regression
- 10.4. Pivot Tables and Cross-Tabulation
  - Cross-Tabulations: Crosstab
- 10.5. Conclusion
11. Time Series
- 11.1. Date and Time Data Types and Tools
  - Converting Between String and Datetime
- 11.2. Time Series Basics
  - Indexing, Selection, Subsetting
  - Time Series with Duplicate Indices
- 11.3. Date Ranges, Frequencies, and Shifting
  - Generating Date Ranges
  - Frequencies and Date Offsets
    - Week of month dates
  - Shifting (Leading and Lagging) Data
    - Shifting dates with offsets
- 11.4. Time Zone Handling
  - Time Zone Localization and Conversion
  - Operations with Time ZoneAware Timestamp Objects
  - Operations Between Different Time Zones
- 11.5. Periods and Period Arithmetic
  - Period Frequency Conversion
  - Quarterly Period Frequencies
  - Converting Timestamps to Periods (and Back)
  - Creating a PeriodIndex from Arrays
- 11.6. Resampling and Frequency Conversion
  - Downsampling
    - Open-High-Low-Close (OHLC) resampling
  - Upsampling and Interpolation
  - Resampling with Periods
- 11.7. Moving Window Functions
  - Exponentially Weighted Functions
  - Binary Moving Window Functions
  - User-Defined Moving Window Functions
- 11.8. Conclusion
12. Advanced pandas
- 12.1. Categorical Data
  - Background and Motivation
  - Categorical Type in pandas
  - Computations with Categoricals
    - Better performance with categoricals
  - Categorical Methods
    - Creating dummy variables for modeling
- 12.2. Advanced GroupBy Use
  - Group Transforms and Unwrapped GroupBys
  - Grouped Time Resampling
- 12.3. Techniques for Method Chaining
  - The pipe Method
- 12.4. Conclusion
13. Introduction to Modeling Libraries in Python
- 13.1. Interfacing Between pandas and Model Code
- 13.2. Creating Model Descriptions with Patsy
  - Data Transformations in Patsy Formulas
  - Categorical Data and Patsy
- 13.3. Introduction to statsmodels
  - Estimating Linear Models
  - Estimating Time Series Processes
- 13.4. Introduction to scikit-learn
- 13.5. Continuing Your Education
14. Data Analysis Examples
- 14.1. 1.USA.gov Data from Bitly
  - Counting Time Zones in Pure Python
  - Counting Time Zones with pandas
- 14.2. MovieLens 1M Dataset
  - Measuring Rating Disagreement
- 14.3. US Baby Names 18802010
  - Analyzing Naming Trends
    - Measuring the increase in naming diversity
    - The last letter revolution
    - Boy names that became girl names (and vice versa)
- 14.4. USDA Food Database
- 14.5. 2012 Federal Election Commission Database
  - Donation Statistics by Occupation and Employer
  - Bucketing Donation Amounts
  - Donation Statistics by State
- 14.6. Conclusion
A. Advanced NumPy
- A.1. ndarray Object Internals
  - NumPy dtype Hierarchy
- A.2. Advanced Array Manipulation
  - Reshaping Arrays
  - C Versus Fortran Order
  - Concatenating and Splitting Arrays
    - Stacking helpers: r_ and c_
  - Repeating Elements: tile and repeat
  - Fancy Indexing Equivalents: take and put
- A.3. Broadcasting
  - Broadcasting Over Other Axes
  - Setting Array Values by Broadcasting
- A.4. Advanced ufunc Usage
  - ufunc Instance Methods
  - Writing New ufuncs in Python
- A.5. Structured and Record Arrays
  - Nested dtypes and Multidimensional Fields
  - Why Use Structured Arrays?
- A.6. More About Sorting
  - Indirect Sorts: argsort and lexsort
  - Alternative Sort Algorithms
  - Partially Sorting Arrays
  - numpy.searchsorted: Finding Elements in a Sorted Array
- A.7. Writing Fast NumPy Functions with Numba
  - Creating Custom numpy.ufunc Objects with Numba
- A.8. Advanced Array Input and Output
  - Memory-Mapped Files
  - HDF5 and Other Array Storage Options
- A.9. Performance Tips
  - The Importance of Contiguous Memory
B. More on the IPython System
- B.1. Using the Command History
  - Searching and Reusing the Command History
  - Input and Output Variables
- B.2. Interacting with the Operating System
  - Shell Commands and Aliases
  - Directory Bookmark System
- B.3. Software Development Tools
  - Interactive Debugger
    - Other ways to make use of the debugger
  - Timing Code: %time and %timeit
  - Basic Profiling: %prun and %run -p
  - Profiling a Function Line by Line
- B.4. Tips for Productive Code Development Using IPython
  - Reloading Module Dependencies
  - Code Design Tips
    - Keep relevant objects and data alive
    - Flat is better than nested
    - Overcome a fear of longer files
- B.5. Advanced IPython Features
  - Making Your Own Classes IPython-Friendly
  - Profiles and Configuration
- B.6. Conclusion
Index