Data Science: The Hard Parts - Helion

ebook

Autor: Daniel Vaughan
ISBN: 9781098146436
stron: 256, Format: ebook
Data wydania: 2023-11-01
Księgarnia: Helion

Cena książki: 220,15 zł (poprzednio: 255,99 zł)
Oszczędzasz: 14% (-35,84 zł)

Osoby, które kupiły tę książkę, wybierały także »

This practical guide provides a collection of techniques and best practices that are generally overlooked in most data engineering and data science pedagogy. A common misconception is that great data scientists are experts in the "big themes" of the discipline—machine learning and programming. But most of the time, these tools can only take us so far. In practice, the smaller tools and skills really separate a great data scientist from a not-so-great one.

Taken as a whole, the lessons in this book make the difference between an average data scientist candidate and a qualified data scientist working in the field. Author Daniel Vaughan has collected, extended, and used these skills to create value and train data scientists from different companies and industries.

With this book, you will:

Understand how data science creates value
Deliver compelling narratives to sell your data science project
Build a business case using unit economics principles
Create new features for a ML model using storytelling
Learn how to decompose KPIs
Perform growth decompositions to find root causes for changes in a metric

Daniel Vaughan is head of data at Clip, the leading paytech company in Mexico. He's the author of Analytical Skills for AI and Data Science (O'Reilly).

Osoby które kupowały "Data Science: The Hard Parts", wybierały także:

Jak zhakowa 125,00 zł, (10,00 zł -92%)
Windows Media Center. Domowe centrum rozrywki 66,67 zł, (8,00 zł -88%)
Ruby on Rails. Ćwiczenia 18,75 zł, (3,00 zł -84%)
Efekt piaskownicy. Jak szefować żeby roboty nie zabrały ci roboty 59,50 zł, (11,90 zł -80%)
Przywództwo w świecie VUCA. Jak być skutecznym liderem w niepewnym środowisku 58,64 zł, (12,90 zł -78%)

Spis treści

Data Science: The Hard Parts eBook -- spis treści

Preface
- Conventions Used in This Book
- Using Code Examples
- OReilly Online Learning
- How to Contact Us
- Acknowledgments
I. Data Analytics Techniques
1. So What? Creating Value with Data Science
- What Is Value?
- What: Understanding the Business
- So What: The Gist of Value Creation in DS
- Now What: Be a Go-Getter
- Measuring Value
- Key Takeaways
- Further Reading
2. Metrics Design
- Desirable Properties That Metrics Should Have
  - Measurable
  - Actionable
  - Relevance
  - Timeliness
- Metrics Decomposition
  - Funnel Analytics
  - Stock-Flow Decompositions
  - P×Q-Type Decompositions
- Example: Another Revenue Decomposition
- Example: Marketplaces
- Key Takeaways
- Further Reading
3. Growth Decompositions: Understanding Tailwinds and Headwinds
- Why Growth Decompositions?
- Additive Decomposition
  - Example
  - Interpretation and Use Cases
- Multiplicative Decomposition
  - Example
  - Interpretation
- Mix-Rate Decompositions
  - Example
  - Interpretation
- Mathematical Derivations
  - Additive Decomposition
  - Multiplicative Decomposition
  - Mix-Rate Decomposition
- Key Takeaways
- Further Reading
4. 2×2 Designs
- The Case for Simplification
- Whats a 2×2 Design?
- Example: Test a Model and a New Feature
- Example: Understanding User Behavior
- Example: Credit Origination and Acceptance
- Example: Prioritizing Your Workflow
- Key Takeaways
- Further Reading
5. Building Business Cases
- Some Principles to Construct Business Cases
- Example: Proactive Retention Strategy
- Fraud Prevention
- Purchasing External Datasets
- Working on a Data Science Project
- Key Takeaways
- Further Reading
6. Whats in a Lift?
- Lifts Defined
- Example: Classifier Model
- Self-Selection and Survivorship Biases
- Other Use Cases for Lifts
- Key Takeaways
- Further Reading
7. Narratives
- Whats in a Narrative: Telling a Story with Your Data
  - Clear and to the Point
  - Credible
  - Memorable
  - Actionable
- Building a Narrative
  - Science as Storytelling
  - What, So What, and Now What?
    - What?
    - So what?
    - Now what?
- The Last Mile
  - Writing TL;DRs
  - Tips to Write Memorable TL;DRs
  - Example: Writing a TL;DR for This Chapter
  - Delivering Powerful Elevator Pitches
  - Presenting Your Narrative
- Key Takeaways
- Further Reading
8. Datavis: Choosing the Right Plot to Deliver a Message
- Some Useful and Not-So-Used Data Visualizations
  - Bar Versus Line Plots
  - Slopegraphs
  - Waterfall Charts
  - Scatterplot Smoothers
  - Plotting Distributions
- General Recommendations
  - Find the Right Datavis for Your Message
  - Choose Your Colors Wisely
  - Different Dimensions in a Plot
  - Aim for a Large Enough Data-Ink Ratio
  - Customization Versus Semiautomation
  - Get the Font Size Right from the Beginning
  - Interactive or Not
  - Stay Simple
  - Start by Explaining the Plot
- Key Takeaways
- Further Reading
II. Machine Learning
9. Simulation and Bootstrapping
- Basics of Simulation
- Simulating a Linear Model and Linear Regression
- What Are Partial Dependence Plots?
- Omitted Variable Bias
- Simulating Classification Problems
  - Latent Variable Models
  - Comparing Different Algorithms
- Bootstrapping
- Key Takeaways
- Further Reading
10. Linear Regression: Going Back to Basics
- Whats in a Coefficient?
- The Frisch-Waugh-Lovell Theorem
- Why Should You Care About FWL?
- Confounders
- Additional Variables
- The Central Role of Variance in ML
- Key Takeaways
- Further Reading
11. Data Leakage
- What Is Data Leakage?
  - Outcome Is Also a Feature
  - A Function of the Outcome Is Itself a Feature
  - Bad Controls
  - Mislabeling of a Timestamp
  - Multiple Datasets with Sloppy Time Aggregations
  - Leakage of Other Information
- Detecting Data Leakage
- Complete Separation
- Windowing Methodology
  - Choosing the Length of the Windows
  - The Training Stage Mirrors the Scoring Stage
  - Implementing the Windowing Methodology
- I Have Leakage: Now What?
- Key Takeaways
- Further Reading
12. Productionizing Models
- What Does Production Ready Mean?
  - Batch Scores (Offline)
  - Real-Time Model Objects
- Data and Model Drift
- Essential Steps in any Production Pipeline
  - Get and Transform Data
  - Validate Data
  - Training and Scoring Stages
  - Validate Model and Scores
  - Deploy Model and Scores
- Key Takeaways
- Further Reading
13. Storytelling in Machine Learning
- A Holistic View of Storytelling in ML
- Ex Ante and Interim Storytelling
  - Creating Hypotheses
    - Predicting human behavior
    - Predicting system behavior
    - Predicting downstream metrics
  - Feature Engineering
- Ex Post Storytelling: Opening the Black Box
  - Interpretability-Performance Trade-Off
  - Linear Regression: Setting a Benchmark
  - Feature Importance
  - Heatmaps
  - Partial Dependence Plots
  - Accumulated Local Effects
- Key Takeaways
- Further Reading
14. From Prediction to Decisions
- Dissecting Decision Making
- Simple Decision Rules by Smart Thresholding
  - Precision and Recall
  - Example: Lead Generation
- Confusion Matrix Optimization
- Key Takeaways
- Further Reading
15. Incrementality: The Holy Grail of Data Science?
- Defining Incrementality
  - Causal Reasoning to Improve Prediction
  - Causal Reasoning as a Differentiator
  - Improved Decision Making
- Confounders and Colliders
- Selection Bias
- Unconfoundedness Assumption
- Breaking Selection Bias: Randomization
- Matching
- Machine Learning and Causal Inference
  - Open Source Codebases
  - Double Machine Learning
- Key Takeaways
- Further Reading
16. A/B Tests
- What Is an A/B Test?
- Decision Criterion
- Minimum Detectable Effects
  - Choosing the Statistical Power, Level, and P
  - Estimating the Variance of the Outcome
  - Simulations
  - Example: Conversion Rates
  - Setting the MDE
- Hypotheses Backlog
  - Metric
  - Hypothesis
  - Ranking
- Governance of Experiments
- Key Takeaways
- Further Reading
17. Large Language Models and the Practice of Data Science
- The Current State of AI
- What Do Data Scientists Do?
- Evolving the Data Scientists Job Description
  - Case Study: A/B Testing
  - Case Study: Data Cleansing
  - Case Study: Machine Learning
- LLMs and This Book
- Key Takeaways
- Further Reading
Index