Anonymizing Health Data. Case Studies and Methods to Get You Started - Helion

ebook

Autor: Khaled El Emam, Luk Arbuckle
ISBN: 978-14-493-6303-1
stron: 228, Format: ebook
Data wydania: 2013-12-11
Księgarnia: Helion

Cena książki: 29,90 zł (poprzednio: 99,67 zł)
Oszczędzasz: 70% (-69,77 zł)

Osoby, które kupiły tę książkę, wybierały także »

Tagi: Analiza danych

Updated as of August 2014, this practical book will demonstrate proven methods for anonymizing health data to help your organization share meaningful datasets, without exposing patient identity. Leading experts Khaled El Emam and Luk Arbuckle walk you through a risk-based methodology, using case studies from their efforts to de-identify hundreds of datasets.

Clinical data is valuable for research and other types of analytics, but making it anonymous without compromising data quality is tricky. This book demonstrates techniques for handling different data types, based on the authors’ experiences with a maternal-child registry, inpatient discharge abstracts, health insurance claims, electronic medical record databases, and the World Trade Center disaster registry, among others.

Understand different methods for working with cross-sectional and longitudinal datasets
Assess the risk of adversaries who attempt to re-identify patients in anonymized datasets
Reduce the size and complexity of massive datasets without losing key information or jeopardizing privacy
Use methods to anonymize unstructured free-form text data
Minimize the risks inherent in geospatial data, without omitting critical location-based health information
Look at ways to anonymize coding information in health data
Learn the challenge of anonymously linking related datasets

Osoby które kupowały "Anonymizing Health Data. Case Studies and Methods to Get You Started", wybierały także:

Fundamentals of Metadata Management. Uncover the Meta Grid and Unlock IT, Data, Information, and Knowledge Management 249,17 zł, (29,90 zł -88%)
Semantic Modeling for Data 249,17 zł, (29,90 zł -88%)
Power BI Desktop. Kurs video. Wykorzystanie narzędzia w analizie i wizualizacji danych 332,50 zł, (39,90 zł -88%)
The Practitioner's Guide to Graph Data. Applying Graph Thinking and Graph Technologies to Solve Complex Problems 230,00 zł, (29,90 zł -87%)
R Cookbook. Proven Recipes for Data Analysis, Statistics, and Graphics. 2nd Edition 230,00 zł, (29,90 zł -87%)

Spis treści

Anonymizing Health Data. Case Studies and Methods to Get You Started eBook -- spis treści

Anonymizing Health Data
Preface
- Audience
- Conventions Used in this Book
- Safari Books Online
- How to Contact Us
- Content Updates
  - August 2014
- Acknowledgements
1. Introduction
- To Anonymize or Not to Anonymize
  - Consent, or Anonymization?
  - Penny Pinching
  - People Are Private
- The Two Pillars of Anonymization
  - Masking Standards
  - De-Identification Standards
    - Lists
    - Heuristics
    - Risk-based methodology
- Anonymization in the Wild
  - Organizational Readiness
  - Making It Practical
  - Making It Automated
  - Use Cases
- Stigmatizing Analytics
- Anonymization in Other Domains
- About This Book
2. A Risk-Based De-Identification Methodology
- Basic Principles
- Steps in the De-Identification Methodology
  - Step 1: Selecting Direct and Indirect Identifiers
  - Step 2: Setting the Threshold
  - Step 3: Examining Plausible Attacks
  - Step 4: De-Identifying the Data
  - Step 5: Documenting the Process
- Measuring Risk Under Plausible Attacks
  - T1: Deliberate Attempt at Re-Identification
  - T2: Inadvertent Attempt at Re-Identification
  - T3: Data Breach
  - T4: Public Data
- Measuring Re-Identification Risk
  - Probability Metrics
  - Information Loss Metrics
- Risk Thresholds
  - Choosing Thresholds
  - Meeting Thresholds
- Risky Business
3. Cross-Sectional Data: Research Registries
- Process Overview
  - Secondary Uses and Disclosures
  - Getting the Data
  - Formulating the Protocol
  - Negotiating with the Data Access Committee
- BORN Ontario
  - BORN Data Set
- Risk Assessment
  - Threat Modeling
  - Results
  - Year on Year: Reusing Risk Analyses
- Final Thoughts
4. Longitudinal Discharge Abstract Data: State Inpatient Databases
- Longitudinal Data
  - Dont Treat It Like Cross-Sectional Data
- De-Identifying Under Complete Knowledge
  - Approximate Complete Knowledge
  - Exact Complete Knowledge
  - Implementation
  - Generalization Under Complete Knowledge
- The State Inpatient Database (SID) of California
  - The SID of California and Open Data
- Risk Assessment
  - Threat Modeling
  - Results
- Final Thoughts
5. Dates, Long Tails, and Correlation: Insurance Claims Data
- The Heritage Health Prize
- Date Generalization
  - Randomizing Dates Independently of One Another
  - Shifting the Sequence, Ignoring the Intervals
  - Generalizing Intervals to Maintain Order
  - Dates and Intervals and Back Again
  - A Different Anchor
  - Other Quasi-Identifiers
  - Connected Dates
- Long Tails
  - The Risk from Long Tails
  - Threat Modeling
  - Number of Claims to Truncate
  - Which Claims to Truncate
- Correlation of Related Items
  - Expert Opinions
  - Predictive Models
  - Implications for De-Identifying Data Sets
- Final Thoughts
6. Longitudinal Events Data: A Disaster Registry
- Adversary Power
  - Keeping Power in Check
  - Power in Practice
  - A Sample of Power
- The WTC Disaster Registry
  - Capturing Events
  - The WTC Data Set
  - The Power of Events
- Risk Assessment
  - Threat Modeling
  - Results
- Final Thoughts
7. Data Reduction: Research Registry Revisited
- The Subsampling Limbo
  - How Low Can We Go?
  - Not for All Types of Risk
  - BORN to Limbo!
- Many Quasi-Identifiers
  - Subsets of Quasi-Identifiers
  - Covering Designs
  - Covering BORN
- Final Thoughts
8. Free-Form Text: Electronic Medical Records
- Not So Regular Expressions
- General Approaches to Text Anonymization
- Ways to Mark the Text as Anonymized
- Evaluation Is Key
  - Appropriate Metrics, Strict but Fair
  - Standards for Recall, and a Risk-Based Approach
  - Standards for Precision
- Anonymization Rules
- Informatics for Integrating Biology and the Bedside (i2b2)
  - i2b2 Text Data Set
- Risk Assessment
  - Threat Modeling
  - A Rule-Based System
  - Results
- Final Thoughts
9. Geospatial Aggregation: Dissemination Areas and ZIP Codes
- Where the Wild Things Are
- Being Good Neighbors
  - Distance Between Neighbors
  - Circle of Neighbors
  - Round Earth
  - Flat Earth
- Clustering Neighbors
  - We All Have Boundaries
  - Fast Nearest Neighbor
- Too Close to Home
  - Levels of Geoproxy Attacks
  - Measuring Geoproxy Risk
  - Accounting for Geoproxy Risk
- Final Thoughts
10. Medical Codes: A Hackathon
- Codes in Practice
- Generalization
  - The Digits of Diseases
  - The Digits of Procedures
  - The (Alpha)Digits of Drugs
- Suppression
- Shuffling
- Final Thoughts
11. Masking: Oncology Databases
- Schema Shmema
- Data in Disguise
  - Field Suppression
  - Randomization
  - Pseudonymization
  - Frequency of Pseudonyms
- Masking On the Fly
- Final Thoughts
12. Secure Linking
- Lets Link Up
- Doing It Securely
  - Dont Try This at Home
  - The Third-Party Problem
  - Basic Layout for Linking Up
- The Nitty-Gritty Protocol for Linking Up
  - Bringing Paillier to the Parties
  - Matching on the Unknown
- Scaling Up
  - Cuckoo Hashing
  - How Fast Does a Cuckoo Run?
- Final Thoughts
13. De-Identification and Data Quality: A Clinical Data Warehouse
- Useful Data from Useful De-Identification
- Degrees of Loss
- Workload-Aware De-Identification
  - Questions to Improve Data Utility
- A Clinical Data Warehouse
  - GI Protocol
  - Chlamydia Protocol
  - Date Shifting
- Final Thoughts
Index
Colophon
Copyright