Data Wrangling with Python. Tips and Tools to Make Your Life Easier - Helion
ISBN: 978-14-919-4877-4
stron: 508, Format: ebook
Data wydania: 2016-02-04
Księgarnia: Helion
Cena książki: 126,65 zł (poprzednio: 147,27 zł)
Oszczędzasz: 14% (-20,62 zł)
How do you take your data analysis skills beyond Excel to the next level? By learning just enough Python to get stuff done. This hands-on guide shows non-programmers like you how to process information that’s initially too messy or difficult to access. You don't need to know a thing about the Python programming language to get started.
Through various step-by-step exercises, you’ll learn how to acquire, clean, analyze, and present data efficiently. You’ll also discover how to automate your data process, schedule file- editing and clean-up tasks, process larger datasets, and create compelling stories with data you obtain.
- Quickly learn basic Python syntax, data types, and language concepts
- Work with both machine-readable and human-consumable data
- Scrape websites and APIs to find a bounty of useful information
- Clean and format data to eliminate duplicates and errors in your datasets
- Learn when to standardize data and when to test and script data cleanup
- Explore and analyze your datasets with new Python libraries and techniques
- Use Python solutions to automate your entire data-wrangling process
Osoby które kupowały "Data Wrangling with Python. Tips and Tools to Make Your Life Easier", wybierały także:
- GraphQL. Kurs video. Buduj nowoczesne API w Pythonie 169,00 zł, (50,70 zł -70%)
- Receptura na Python. Kurs Video. 54 praktyczne porady dla programist 199,00 zł, (59,70 zł -70%)
- Podstawy Pythona z Minecraftem. Kurs video. Piszemy pierwsze skrypty 149,00 zł, (44,70 zł -70%)
- Twórz gry w Pythonie. Kurs video. Poznaj bibliotekę PyGame 249,00 zł, (74,70 zł -70%)
- Data Science w Pythonie. Kurs video. Algorytmy uczenia maszynowego 199,00 zł, (59,70 zł -70%)
Spis treści
Data Wrangling with Python. Tips and Tools to Make Your Life Easier eBook -- spis treści
- Preface
- Who Should Read This Book
- Who Should Not Read This Book
- How This Book Is Organized
- What Is Data Wrangling?
- What to Do If You Get Stuck
- Conventions Used in This Book
- Using Code Examples
- Safari Books Online
- How to Contact Us
- Acknowledgments
- 1. Introduction to Python
- Why Python
- Getting Started with Python
- Which Python Version
- Setting Up Python on Your Machine
- Mac OS X
- Windows 8 and 10
- Test Driving Python
- Install pip
- Install a Code Editor
- Optional: Install IPython
- Summary
- 2. Python Basics
- Basic Data Types
- Strings
- Integers and Floats
- Integers
- Floats, decimals, and other nonwhole number types
- Data Containers
- Variables
- Lists
- Dictionaries
- What Can the Various Data Types Do?
- String Methods: Things Strings Can Do
- Numerical Methods: Things Numbers Can Do
- List Methods: Things Lists Can Do
- Dictionary Methods: Things Dictionaries Can Do
- Helpful Tools: type, dir, and help
- type
- dir
- help
- Putting It All Together
- What Does It All Mean?
- Summary
- Basic Data Types
- 3. Data Meant to Be Read by Machines
- CSV Data
- How to Import CSV Data
- Saving the Code to a File; Running from Command Line
- JSON Data
- How to Import JSON Data
- XML Data
- How to Import XML Data
- Summary
- CSV Data
- 4. Working with Excel Files
- Installing Python Packages
- Parsing Excel Files
- Getting Started with Parsing
- Summary
- 5. PDFs and Problem Solving in Python
- Avoid Using PDFs!
- Programmatic Approaches to PDF Parsing
- Opening and Reading Using slate
- Converting PDF to Text
- Parsing PDFs Using pdfminer
- Learning How to Solve Problems
- Exercise: Use Table Extraction, Try a Different Library
- Exercise: Clean the Data Manually
- Exercise: Try Another Tool
- Uncommon File Types
- Summary
- 6. Acquiring and Storing Data
- Not All Data Is Created Equal
- Fact Checking
- Readability, Cleanliness, and Longevity
- Where to Find Data
- Using a Telephone
- US Government Data
- Government and Civic Open Data Worldwide
- EU and UK
- Africa
- Asia
- Non-EU Europe, Central Asia, India, the Middle East, and Russia
- South America and Canada
- Organization and Non-Government Organization (NGO) Data
- Education and University Data
- Medical and Scientific Data
- Crowdsourced Data and APIs
- Case Studies: Example Data Investigation
- Ebola Crisis
- Train Safety
- Football Salaries
- Child Labor
- Storing Your Data: When, Why, and How?
- Databases: A Brief Introduction
- Relational Databases: MySQL and PostgreSQL
- MySQL and Python
- PostgreSQL and Python
- Non-Relational Databases: NoSQL
- MongoDB with Python
- Setting Up Your Local Database with Python
- Relational Databases: MySQL and PostgreSQL
- When to Use a Simple File
- Cloud-Storage and Python
- Local Storage and Python
- Alternative Data Storage
- Summary
- 7. Data Cleanup: Investigation, Matching, and Formatting
- Why Clean Data?
- Data Cleanup Basics
- Identifying Values for Data Cleanup
- Replacing headers
- Zipping questions and answers
- Formatting Data
- Finding Outliers and Bad Data
- Finding Duplicates
- Fuzzy Matching
- RegEx Matching
- What to Do with Duplicate Records
- Identifying Values for Data Cleanup
- Summary
- 8. Data Cleanup: Standardizing and Scripting
- Normalizing and Standardizing Your Data
- Saving Your Data
- Determining What Data Cleanup Is Right for Your Project
- Scripting Your Cleanup
- Testing with New Data
- Summary
- 9. Data Exploration and Analysis
- Exploring Your Data
- Importing Data
- Exploring Table Functions
- Joining Numerous Datasets
- Identifying Correlations
- Identifying Outliers
- Creating Groupings
- Further Exploration
- Analyzing Your Data
- Separating and Focusing Your Data
- What Is Your Data Saying?
- Drawing Conclusions
- Documenting Your Conclusions
- Summary
- Exploring Your Data
- 10. Presenting Your Data
- Avoiding Storytelling Pitfalls
- How Will You Tell the Story?
- Know Your Audience
- Visualizing Your Data
- Charts
- Charting with matplotlib
- Charting with Bokeh
- Time-Related Data
- Time series data
- Timeline data
- Maps
- Interactives
- Words
- Images, Video, and Illustrations
- Charts
- Presentation Tools
- Publishing Your Data
- Using Available Sites
- Medium
- Easy-to-start sites: WordPress, Squarespace
- Your own blog
- Open Source Platforms: Starting a New Site
- Ghost
- GitHub Pages and Jekyll
- One-click deploys
- Jupyter (Formerly Known as IPython Notebooks)
- Shared Jupyter notebooks
- Using Available Sites
- Summary
- Avoiding Storytelling Pitfalls
- 11. Web Scraping: Acquiring and Storing Data from the Web
- What to Scrape and How
- Analyzing a Web Page
- Inspection: Markup Structure
- Network/Timeline: How the Page Loads
- Console: Interacting with JavaScript
- Style basics
- jQuery and JavaScript
- In-Depth Analysis of a Page
- Getting Pages: How to Request on the Internet
- Reading a Web Page with Beautiful Soup
- Reading a Web Page with LXML
- A Case for XPath
- Summary
- 12. Advanced Web Scraping: Screen Scrapers and Spiders
- Browser-Based Parsing
- Screen Reading with Selenium
- Selenium and headless browsers
- Screen Reading with Ghost.Py
- Screen Reading with Selenium
- Spidering the Web
- Building a Spider with Scrapy
- Crawling Whole Websites with Scrapy
- Networks: How the Internet Works and Why Its Breaking Your Script
- The Changing Web (or Why Your Script Broke)
- A (Few) Word(s) of Caution
- Summary
- Browser-Based Parsing
- 13. APIs
- API Features
- REST Versus Streaming APIs
- Rate Limits
- Tiered Data Volumes
- API Keys and Tokens
- Creating a Twitter API key and access token
- A Simple Data Pull from Twitters REST API
- Advanced Data Collection from Twitters REST API
- Advanced Data Collection from Twitters Streaming API
- Summary
- API Features
- 14. Automation and Scaling
- Why Automate?
- Steps to Automate
- What Could Go Wrong?
- Where to Automate
- Special Tools for Automation
- Using Local Files, argv, and Config Files
- Local files
- Config files
- Command-line arguments
- Using the Cloud for Data Processing
- Using Git to deploy Python
- Using Parallel Processing
- Using Distributed Processing
- Using Local Files, argv, and Config Files
- Simple Automation
- CronJobs
- Web Interfaces
- Jupyter Notebooks
- Large-Scale Automation
- Celery: Queue-Based Automation
- Ansible: Operations Automation
- Monitoring Your Automation
- Python Logging
- Adding Automated Messaging
- SMS and voice
- Chat integration
- Uploading and Other Reporting
- Logging and Monitoring as a Service
- Logging and exceptions
- Logging and monitoring
- No System Is Foolproof
- Summary
- 15. Conclusion
- Duties of a Data Wrangler
- Beyond Data Wrangling
- Become a Better Data Analyst
- Become a Better Developer
- Become a Better Visual Storyteller
- Become a Better Systems Architect
- Where Do You Go from Here?
- A. Comparison of Languages Mentioned
- C, C++, and Java Versus Python
- R or MATLAB Versus Python
- HTML Versus Python
- JavaScript Versus Python
- Node.js Versus Python
- Ruby and Ruby on Rails Versus Python
- B. Python Resources for Beginners
- Online Resources
- In-Person Groups
- C. Learning the Command Line
- Bash
- Navigation
- Modifying Files
- Executing Files
- Searching with the Command Line
- More Resources
- Windows CMD/Power Shell
- Navigation
- Modifying Files
- Executing Files
- Searching with the Command Line
- More Resources
- Bash
- D. Advanced Python Setup
- Step 1: Install GCC
- Step 2: (Mac Only) Install Homebrew
- Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- Step 4: Install Python 2.7
- Step 5: Install virtualenv (Windows, Mac, Linux)
- Step 6: Set Up a New Directory
- Step 7: Install virtualenvwrapper
- Installing virtualenvwrapper (Mac and Linux)
- Updating your .bashrc
- Installing virtualenvwrapper-win (Windows)
- Testing Your Virtual Environment (Windows, Mac, Linux)
- Installing virtualenvwrapper (Mac and Linux)
- Learning About Our New Environment (Windows, Mac, Linux)
- Advanced Setup Review
- E. Python Gotchas
- Hail the Whitespace
- The Dreaded GIL
- = Versus == Versus is, and When to Just Copy
- Default Function Arguments
- Python Scope and Built-Ins: The Importance of Variable Names
- Defining Objects Versus Modifying Objects
- Changing Immutable Objects
- Type Checking
- Catching Multiple Exceptions
- The Power of Debugging
- F. IPython Hints
- Why Use IPython?
- Getting Started with IPython
- Magic Functions
- Final Thoughts: A Simpler Terminal
- G. Using Amazon Web Services
- Spinning Up an AWS Server
- AWS Step 1: Choose an Amazon Machine Image (AMI)
- AWS Step 2: Choose an Instance Type
- AWS Step 7: Review Instance Launch
- AWS Extra Question: Select an Existing Key Pair or Create a New One
- Logging into an AWS Server
- Get the Public DNS Name of the Instance
- Prepare Your Private Key
- Log into Your Server
- Summary
- Spinning Up an AWS Server
- Index