reklama - zainteresowany?

Practical Synthetic Data Generation. Balancing Privacy and the Broad Availability of Data - Helion

Practical Synthetic Data Generation. Balancing Privacy and the Broad Availability of Data
ebook
Autor: Khaled El Emam, Lucy Mosquera, Richard Hoptroff
ISBN: 978-14-920-7269-0
stron: 166, Format: ebook
Data wydania: 2020-05-19
Księgarnia: Helion

Cena książki: 194,65 zł (poprzednio: 218,71 zł)
Oszczędzasz: 11% (-24,06 zł)

Dodaj do koszyka Practical Synthetic Data Generation. Balancing Privacy and the Broad Availability of Data

Tagi: Analiza danych

Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic data—fake data generated from real data—so you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue.

Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution.

This book describes:

  • Steps for generating synthetic data using multivariate normal distributions
  • Methods for distribution fitting covering different goodness-of-fit metrics
  • How to replicate the simple structure of original data
  • An approach for modeling data structure to consider complex relationships
  • Multiple approaches and metrics you can use to assess data utility
  • How analysis performed on real data can be replicated with synthetic data
  • Privacy implications of synthetic data and methods to assess identity disclosure

Dodaj do koszyka Practical Synthetic Data Generation. Balancing Privacy and the Broad Availability of Data

 

Osoby które kupowały "Practical Synthetic Data Generation. Balancing Privacy and the Broad Availability of Data", wybierały także:

  • Data Science w Pythonie. Kurs video. Przetwarzanie i analiza danych
  • Excel 2013. Kurs video. Poziom drugi. Przetwarzanie i analiza danych
  • Zarz
  • Eksploracja danych za pomoc
  • Google Analytics od podstaw. Analiza wp

Dodaj do koszyka Practical Synthetic Data Generation. Balancing Privacy and the Broad Availability of Data

Spis treści

Practical Synthetic Data Generation. Balancing Privacy and the Broad Availability of Data eBook -- spis treści

  • Preface
    • Conventions Used in This Book
    • OReilly Online Learning
    • How to Contact Us
    • Acknowledgments
  • 1. Introducing Synthetic Data Generation
    • Defining Synthetic Data
      • Synthesis from Real Data
      • Synthesis Without Real Data
      • Synthesis and Utility
    • The Benefits of Synthetic Data
      • Efficient Access to Data
      • Enabling Better Analytics
      • Synthetic Data as a Proxy
      • Learning to Trust Synthetic Data
    • Synthetic Data Case Studies
      • Manufacturing and Distribution
      • Healthcare
        • Data for cancer research
        • Evaluating innovative digital health technologies
      • Financial Services
        • Synthetic data benchmarks
        • Software testing
      • Transportation
        • Microsimulation models
        • Data synthesis for autonomous vehicles
    • Summary
  • 2. Implementing Data Synthesis
    • When to Synthesize
    • Identifiability Spectrum
    • Trade-Offs in Selecting PETs to Enable Data Access
      • Decision Criteria
      • PETs Considered
      • Decision Framework
      • Examples of Applying the Decision Framework
    • Data Synthesis Projects
      • Data Synthesis Steps
      • Data Preparation
    • The Data Synthesis Pipeline
    • Synthesis Program Management
    • Summary
  • 3. Getting Started: Distribution Fitting
    • Framing Data
    • How Data Is Distributed
    • Fitting Distributions to Real Data
    • Generating Synthetic Data from a Distribution
      • Measuring How Well Synthetic Data Fits a Distribution
      • The Overfitting Dilemma
      • A Little Light Weeding
    • Summary
  • 4. Evaluating Synthetic Data Utility
    • Synthetic Data Utility Framework: Replication of Analysis
    • Synthetic Data Utility Framework: Utility Metrics
      • Comparing Univariate Distributions
      • Comparing Bivariate Statistics
      • Comparing Multivariate Prediction Models
      • Distinguishability
    • Summary
  • 5. Methods for Synthesizing Data
    • Generating Synthetic Data from Theory
      • Sampling from a Multivariate Normal Distribution
      • Inducing Correlations with Specified Marginal Distributions
      • Copulas with Known Marginal Distributions
    • Generating Realistic Synthetic Data
      • Fitting Real Data to Known Distributions
      • Using Machine Learning to Fit the Distributions
    • Hybrid Synthetic Data
    • Machine Learning Methods
    • Deep Learning Methods
    • Synthesizing Sequences
    • Summary
  • 6. Identity Disclosure in Synthetic Data
    • Types of Disclosure
      • Identity Disclosure
      • Learning Something New
      • Attribute Disclosure
      • Inferential Disclosure
      • Meaningful Identity Disclosure
      • Defining Information Gain
      • Bringing It All Together
      • Unique Matches
    • How Privacy Law Impacts the Creation and Use of Synthetic Data
      • Issues Under the GDPR
        • Is the use of the original (real) dataset to generate and/or evaluate a synthetic dataset restricted or regulated under the GDPR?
        • Is sharing the original dataset with a third-party service provider to generate the synthetic dataset restricted or regulated under the GDPR?
        • Does the GDPR regulate or otherwise affect (if at all) the resulting synthetic dataset?
      • Issues Under the CCPA
        • Is the use of the original (real) dataset to generate and/or evaluate a synthetic dataset restricted or regulated under the CCPA?
        • Is sharing the original dataset with a third-party service provider to generate the synthetic dataset restricted or regulated under the CCPA?
        • Does the CCPA regulate or otherwise affect (if at all) the resulting synthetic dataset?
      • Issues Under HIPAA
        • Is the use of the original (real) dataset to generate and/or evaluate a synthetic dataset restricted or regulated under HIPAA?
        • Is sharing the original dataset with a third-party service provider to generate the synthetic dataset restricted or regulated under HIPAA?
        • Does HIPAA regulate or otherwise affect (if at all) the resulting synthetic dataset?
      • Article 29 Working Party Opinion
        • Singling out
        • Linkability
        • Inference
        • Closing comments on the Article 29 opinion
    • Summary
  • 7. Practical Data Synthesis
    • Managing Data Complexity
      • For Every Pre-Processing Step There Is a Post-Processing Step
      • Field Types
      • The Need for Rules
      • Not All Fields Have to Be Synthesized
      • Synthesizing Dates
      • Synthesizing Geography
      • Lookup Fields and Tables
      • Missing Data and Other Data Characteristics
      • Partial Synthesis
    • Organizing Data Synthesis
      • Computing Capacity
      • A Toolbox of Techniques
      • Synthesizing Cohorts Versus Full Datasets
      • Continuous Data Feeds
      • Privacy Assurance as Certification
      • Performing Validation Studies to Get Buy-In
      • Motivated Intruder Tests
      • Who Owns Synthetic Data?
    • Conclusions
  • Index

Dodaj do koszyka Practical Synthetic Data Generation. Balancing Privacy and the Broad Availability of Data

Code, Publish & WebDesing by CATALIST.com.pl



(c) 2005-2024 CATALIST agencja interaktywna, znaki firmowe należą do wydawnictwa Helion S.A.